首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A parallel algorithm for transforming an n × n symmetric matrix to tridiagonal form is described. The algorithm implements Givens rotations on a square array of n × n processors in such a way that the transformation can be performed in time O(n log n). The processors require only nearest-neighbor communication. The reduction to tridiagonal form could be the first step in the parallel solution of the symmetric eigenvalue problem in time O(n log n).  相似文献   

2.
3.
We present the design and analysis of a nearly-linear work parallel algorithm for solving symmetric diagonally dominant (SDD) linear systems. On input an SDD n-by-n matrix A with m nonzero entries and a vector b, our algorithm computes a vector \(\tilde{x}\) such that \(\|\tilde{x} - A^{+}b\|_{A} \leq\varepsilon\cdot\|{A^{+}b}\|_{A}\) in \(O(m\log^{O(1)}{n}\log {\frac{1}{\varepsilon}})\) work and \(O(m^{1/3+\theta}\log\frac{1}{\varepsilon})\) depth for any θ>0, where A + denotes the Moore-Penrose pseudoinverse of A. The algorithm relies on a parallel algorithm for generating low-stretch spanning trees or spanning subgraphs. To this end, we first develop a parallel decomposition algorithm that in O(mlog O(1) n) work and polylogarithmic depth, partitions a graph with n nodes and m edges into components with polylogarithmic diameter such that only a small fraction of the original edges are between the components. This can be used to generate low-stretch spanning trees with average stretch O(n α ) in O(mlog O(1) n) work and O(n α ) depth for any α>0. Alternatively, it can be used to generate spanning subgraphs with polylogarithmic average stretch in O(mlog O(1) n) work and polylogarithmic depth. We apply this subgraph construction to derive a parallel linear solver. By using this solver in known applications, our results imply improved parallel randomized algorithms for several problems, including single-source shortest paths, maximum flow, minimum-cost flow, and approximate maximum flow.  相似文献   

4.
Traditionally, the block-based medial axis transform (BB-MAT) and the chessboard distance transform (CDT) were usually viewed as two completely different image computation problems, especially for three dimensional (3D) space. In fact, there exist some equivalent properties between them. The relationship between both of them is first derived and proved in this paper. One of the significant properties is that CDT for 3D binary image V is equal to BB-MAT for image V' where it denotes the inverse image of V. In a parallel algorithm, a cost is defined as the product of the time complexity and the number of processors used. The main contribution of this work is to reduce the costs of 3D BB-MAT and 3D CDT problems proposed by Wang [65]. Based on the reverse-dominance technique which is redefined from dominance concept, we achieve the computation of the 3D CDT problem by implementing the 3D BB-MAT algorithm first. For a 3D binary image of size N3, our parallel algorithm can be run in O(logN) time using N3 processors on the concurrent read exclusive write (CREW) parallel random access machine (PRAM) model to solve both 3D BB-MAT and 3D CDT problems, respectively. The presented results for the cost are reduced in comparison with those of Wang's. To the best of our knowledge, this work is the lowest costs for the 3D BB-MAT and 3D CDT algorithms known. In parallel algorithms, the running time can be divided into computation time and communication time. The experimental results of the running, communication and computation times for the different problem sizes are implemented in an HP Superdome with SMP/CC-NUMA (symmetric multiprocessor/cache coherent non-uniform memory access) architecture. We conclude that the parallel computer (i.e., SMP/CC-NUMA architecture or cluster system) is more suitable for solving problems with a large amount of input size.  相似文献   

5.
Maximal outerplanar graphs constitute an important class of graphs, often encountered in various applications, e.g., computational geometry, robotics, etc. In this paper, we propose a parallel algorithm for testing the isomorphism of maximal outerplanar graphs. Given the ordered adjacency lists of the two graphs, the proposed algorithm tests their isomorphism inO(log N) time usingNprocessors, for graphs withNnodes on an EREW shared memory model, as well as on a hypercube arhitecture. When the adjacency matrices of the graphs are given, this algorithm can be redesigned onN2processors to run inO(log N) time.  相似文献   

6.
We present a parallel algorithm for finding minimum cutsets in reducible graphs. For a reducible graph that has N nodes our algorithm runs in O(log3N) time using O(N3/log N) PEs on the EREW P-RAM model of computation. We also present a parallel heuristic for finding minimal cutsets in general graphs.  相似文献   

7.
This paper describes a new factorization of the inverse of the joint-space inertia matrix M. In this factorization, M ?1 is directly obtained as the product of a set of sparse matrices wherein, for a serial chain, only the inversion of a block-tridiagonal matrix is needed. In other words, this factorization reduces the inversion of a dense matrix to that of a block-tridiagonal one. As a result, this factorization leads to both an optimal serial and an optimal parallel algorithm, that is, a serial algorithm with a complexity of O(N) and a parallel algorithm with a time complexity of O(logN) on a computer with O(N) processors. The novel feature of this algorithm is that it first calculates the interbody forces. Once these forces are known, the accelerations are easily calculated. We discuss the extension of the algorithm to the task of calculating the forward dynamics of a kinematic tree consisting of a single main chain plus any number of short side branches. We also show that this new factorization of M ?1 leads to a new factorization of the operational-space inverse inertia, Λ ?1, in the form of a product involving sparse matrices. We show that this factorization can be exploited for optimal serial and parallel computation of Λ ?1, that is, a serial algorithm with a complexity of O(N) and a parallel algorithm with a time complexity of O(logN) on a computer with O(N) processors.  相似文献   

8.
In this paper, a new algorithm is developed to reduce the computational complexity of Ward’s method. The proposed approach uses a dynamic k-nearest-neighbor list to avoid the determination of a cluster’s nearest neighbor at some steps of the cluster merge. Double linked algorithm (DLA) can significantly reduce the computing time of the fast pairwise nearest neighbor (FPNN) algorithm by obtaining an approximate solution of hierarchical agglomerative clustering. In this paper, we propose a method to resolve the problem of a non-optimal solution for DLA while keeping the corresponding advantage of low computational complexity. The computational complexity of the proposed method DKNNA + FS (dynamic k-nearest-neighbor algorithm with a fast search) in terms of the number of distance calculations is O(N2), where N is the number of data points. Compared to FPNN with a fast search (FPNN + FS), the proposed method using the same fast search algorithm (DKNNA + FS) can reduce the computing time by a factor of 1.90-2.18 for the data set from a real image. In comparison with FPNN + FS, DKNNA + FS can reduce the computing time by a factor of 1.92-2.02 using the data set generated from three images. Compared to DLA with a fast search (DLA + FS), DKNNA + FS can decrease the average mean square error by 1.26% for the same data set.  相似文献   

9.
In this paper, we present a parallel sorting algorithm using the technique of multi-way merge. This algorithm, when implemented on a t dimensional mesh having nt nodes (t>2), sorts nt elements in O((t2−3t+2) n) time, thus offering a better order of time complexity than the [((t2t) n log n)/2+O(nt)]-time algorithm of P. F. Corbett and I. D. Scherson (1992, IEEE Trans. Parallel Distrib. Systems3, 626–632). Further, the proposed algorithm can also be implemented on a Multi-Mesh network (1999, D. Das, M. De, and B. P. Sinha, IEEE Trans. Comput.48, 536–551) to sort N elements in 54N1/4+o(N1/4) steps, which shows an improvement over 58N1/4+o(N1/4) steps needed by the algorithm in (1997, M. De, D. Das, M. Ghosh, and P. B. Sinha, IEEE Trans. Comput.46, 1132–1137).  相似文献   

10.
11.
The maxima-finding is a fundamental problem in computational geometry with many applications. In this paper, a volume first maxima-finding algorithm is proposed. It is proved that the expected running time of the algorithm is N+o(N) when choosing points from CI distribution, which is a new theoretical result when the points belong to d(>2) dimensional space. Experimental results and theoretical analysis indicate that the algorithm runs faster than the Move-To-Front maxima-finding algorithm.  相似文献   

12.
Designing parallel versions of sequential algorithms has attracted renewed attention, due to recent hardware advances, including various general-purpose multi-core and many-core processors, as well as special-purpose FPGA implementations. P systems consist of networks of autonomous cells, such that each cell transforms its input signals in accord with its symbol-rewriting rules and feeds the output results into its immediate neighbours. Inherent massive intra- and inter-cell parallelisms make P systems a prospective theoretical testbed for designing efficient parallel and parallel-sequential algorithms. This paper discusses the capabilities of P systems to implement the symmetric dynamic programming stereo (SDPS) matching algorithm, which explicitly accounts for binocular or monocular visibility of 3D surface points. Given enough cells, the P system implementation speeds up the inner algorithm loop from O(nd) to O(n+d), where n is the width of a stereo image and d is the disparity range. The implementation gives also an insight into a more general SDPS that accounts for a possible multiplicity of solutions of the ill-posed problem of optimal stereo matching.  相似文献   

13.
We analyze two single machine scheduling problems for the case where job processing times are controllable, by allocating continuous and non-renewable resources to the processing operations. The first problem to analyze is constructing the trade-off curve between maximum lateness and total resource consumption; an O(n 2) computational time optimization algorithm was constructed to solve this problem. This algorithm was extended to solve the second problem, which is to construct the trade-off surface between maximum lateness, makespan, and total resource consumption. As part of this algorithm we identify a plane in the 3D field that is formed by the three criteria, which is parallel only to the maximum lateness, and calculate the optimal makespan and total resource consumption as functions of points on this plane. The extended algorithm keeps the same complexity of O(n 2) time. Both algorithms are very robust as they solve the problem for a very large set of resource consumption functions which has to follow only some mild (and commonly acceptable) conditions. Moreover, as far as we know, this is the first research of its kind in the field of multi-objective scheduling to present an algorithm that constructs a 3D trade-off surface.  相似文献   

14.
The reconfigurable array with slotted optical buses (RASOB) has recently received a lot of attention from the research community. In this paper, we first discuss the reconfiguration methods and communication capabilities of the RASOB architecture. Then, we use this architecture for the implementation of efficient sorting algorithms on the 1D RASOB and the 2D RASOB. Our parallel sorting algorithm on the 1D RASOB is based on an efficient divide-and-conquer scheme. It sortsNdata items usingNprocessors inO(k) communication cycles where k is the size of the data items to be sorted in bits. We further develop a parallel sorting algorithm on the 2D RASOB based on the sorting algorithm on the 1D RASOB in conjunction with the well known Rotatesort algorithm. Similarly, this algorithm sortsNdata items on a 2D RASOB of sizeNinO(k) communication cycles. These sorting algorithms are much more efficient than state-of-the-art sorting algorithms on reconfigurable arrays of processors withelectronicbuses using the same number of processors.  相似文献   

15.
The shear crack, propagating spontaneously on a frictional interface, is a useful idealization of a natural earthquake. However, the corresponding boundary value problems are quite challenging in terms of required memory and processor power. While the huge computation amount is reduced by the spectral boundary integral method, the computation effort is still huge. In this paper, a recursive method for the evaluation of convolution integrals was tested in the spectral formulation of the boundary integral method applied to 2D anti-plane crack propagation problems. It is shown that analysis of a 2D anti-plane crack propagation problem involving Nt time steps, based on the recursive evaluation of convolution integrals, requires O(αNt) computational resources for each Fourier mode (as opposed to O(Nt2) for a classical algorithm), where α is a constant depending on the implementation of the method with typical values much less than Nt. Therefore, this recursive scheme renders feasible investigation of long deformational processes involving large surfaces and long periods of time, while preserving accuracy. The computation methodology implemented here can be extended easily to 3D cases where it can be employed for the simulation of complex spontaneously fault rupture problems which carry a high computational cost.  相似文献   

16.
A novel reconfigurable network referred to as the Reconfigurable Multi-Ring Network (RMRN) is described. The RMRN is shown to be a truly scalable network, in that each node in the network has a fixed degree of connectivity and the reconfiguration mechanism ensures a network diameter of O(log2N) for an N-processor network. Algorithms for the 2-D mesh and the SIMD n-cube are shown to map very elegantly onto the RMRN. Basic message passing and reconfiguration primitives for the SIMD RMRN are designed which could be used as building blocks for more complex parallel algorithms. The RMRN is shown to be a viable architecture for image processing and computer vision problems via the parallel computation of the Hough transform. The parallel implementation of the Y-angle Hough transform of an N × N image is showed to have a asymptotic complexity of O(Y log2Y + log2N) on the SIMD RMRN with O(N2) processors. This compares favorably with the O(Y + log2N) optimal algorithm for the same Hough transform on the MIMD n-cube with O(N2) processors.  相似文献   

17.
Independent spanning trees (ISTs) on networks have applications to increase fault-tolerance, bandwidth, and security. Möbius cubes are a class of the important variants of hypercubes. A recursive algorithm to construct n ISTs on n-dimensional Möbius cube M n was proposed in the literature. However, there exists dependency relationship during the construction of ISTs and the time complexity of the algorithm is as high as O(NlogN), where N=2 n is the number of vertices in M n and n≥2. In this paper, we study the parallel construction and a diagnostic application of ISTs on Möbius cubes. First, based on a circular permutation n?1,n?2,…,0 and the definitions of dimension-backbone walk and dimension-backbone tree, we propose an O(N) parallel algorithm, called PMCIST, to construct n ISTs rooted at an arbitrary vertex on M n . Based on algorithm PMCIST, we further present an O(n) parallel algorithm. Then we provide a parallel diagnostic algorithm with high efficiency to diagnose all the vertices in M n by at most n+1 steps, provided the number of faulty vertices does not exceed n. Finally, we present simulation experiments of ISTs and an application of ISTs in diagnosis on 0-M 4.  相似文献   

18.
The basic goal in combinatorial group testing is to identify a set of up to d defective items within a large population of size n?d using a pooling strategy. Namely, the items can be grouped together in pools, and a single measurement would reveal whether there are one or more defectives in the pool. The threshold model is a generalization of this idea where a measurement returns positive if the number of defectives in the pool reaches a fixed threshold u>0, negative if this number is no more than a fixed lower threshold ?<u, and may behave arbitrarily otherwise. We study non-adaptive threshold group testing (in a possibly noisy setting) and show that, for this problem, O(d g+2(logd)log(n/d)) measurements (where g:=u???1 and u is any fixed constant) suffice to identify the defectives, and also present almost matching lower bounds. This significantly improves the previously known (non-constructive) upper bound O(d u+1log(n/d)). Moreover, we obtain a framework for explicit construction of measurement schemes using lossless condensers. The number of measurements resulting from this scheme is ideally bounded by O(d g+3(logd)logn). Using state-of-the-art constructions of lossless condensers, however, we obtain explicit testing schemes with O(d g+3(logd)quasipoly(logn)) and O(d g+3+β poly(logn)) measurements, for arbitrary constant β>0.  相似文献   

19.
We present a simple parallel algorithm for the single-source shortest path problem in planar digraphs with nonnegative real edge weights. The algorithm runs on the EREW PRAM model of parallel computation in O((n2ε+n1−ε) log n) time, performing O(n1+ε log n) work for any 0<ε<1/2. The strength of the algorithm is its simplicity, making it easy to implement and presumable quite efficient in practice. The algorithm improves upon the work of all previous parallel algorithms. Our algorithm is based on a region decomposition of the input graph and uses a well-known parallel implementation of Dijkstra's algorithm. The logarithmic factor in both the work and the time can be eliminated by plugging in a less practical, sequential planar shortest path algorithm together with an improved parallel implementation of Dijkstra's algorithm.  相似文献   

20.
In this paper we report on a high-order fast method to numerically calculate convolution integral with smooth non-periodic kernel. This method is based on the Newton-Cotes quadrature rule for the integral approximation and an FFT method for discrete summation. The method can have an arbitrarily high-order accuracy in principle depending on the number of points used in the integral approximation and a computational cost of O(Nlog(N)), where N is the number of grid points. For a three-point Simpson rule approximation, the method has an accuracy of O(h4), where h is the size of the computational grid. Applications of the Simpson rule based algorithm to the calculation of a one-dimensional continuous Gauss transform and to the calculation of a two-dimensional electric field from a charged beam are also presented.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号