首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 390 毫秒
1.
This paper develops optimal algorithms to multiply an n × n symmetric tridiagonal matrix by: (i) an arbitrary n × m matrix using 2nmm multiplications; (ii) a symmetric tridiagonal matrix using 6n − 7 multiplications; and (iii) a tridiagonal matrix using 7n −8 multiplications. Efficient algorithms are also developed to multiply a tridiagonal matrix by an arbitrary matrix, and to multiply two tridiagonal matrices.  相似文献   

2.
This paper outlines an algorithm for optimum linear ordering (OLO) of a weighted parallel graph with O(n log k) worst-case time complexity, and O(n + k log(n/k) log k) expected-case time complexity, where n is the total number of nodes and k is the number of chains in the parallel graph. Next, the two-layer OLO problem is considered, where the goal is to place the nodes linearly in two routing layers minimizing the total wire length. The two-layer problem is shown to subsume the maxcut problem and a befitting heuristic algorithm is proposed. Experimental results on randomly generated samples show that the heuristic algorithm runs very fast and outputs optimum solutions in more than 90% instances.  相似文献   

3.
In this paper we present an O(log n) time parallel algorithm for arithmetic expression evaluation, on an n × n processor array with reconfigurable bus system, where n is the sum of the number of operators and constants in the expression. The basic technique involved here is leaves-cutting (rake operation), as in the case of PRAM model algorithms available in the literature for this problem. The input to our algorithm is assumed to be the binary tree associated with a given expression (also known as expression tree with n number of nodes). Our algorithm is faster compared to the previous best time for expression evaluation on mesh connected computers which is O(√n).  相似文献   

4.
We show that the notoriously difficult problem of finding and reporting the smallest number of vertex-disjoint paths that cover the vertices of a graph can be solved time- and work-optimally for cographs. Our result implies that for this class of graphs the task of finding a Hamiltonian path can be solved time- and work-optimally in parallel.

It was open for more than 10 years to find a time- and work-optimal parallel solution for this important problem. Our contribution is to offer an optimal solution to this important problem. We begin by showing that any algorithm that solves an instance of size n of the problem must take Ω(log n) time on the CREW, even if an infinite number of processors are available. We then go on to show that this time lower bound is tight by devising an EREW algorithm that, given an n-vertex cograph G represented by its cotree, finds and reports all the paths in a minimum path cover in O(log n) time using n/log n processors.  相似文献   


5.
Two parallel algorithms for finding minimum spanning forest (MSF) of a weighted undirected graph on hypercube computers, consisting of a fixed number of processors, are presented. One algorithm is suited for sparse graphs, the other for dense graphs. Our design strategy is based on successive elimination of non-MSF edges. The input graph is partitioned equally among different processors, which then repeatedly eliminate non-MSF edges and merge results to gradually construct the desired MSF of the entire graph. Low communication overhead is achieved by restricting the message-flow to between the neighboring processors in the hypercube topology. The correctness of our approach is due to a theorem which states that with total-ordered edges, if an edge of an arbitrary subgraph does not belong to its MSF, then it does not belong to the MSF of the entire graph. For a graph of n vertices and m edges, our first algorithm finds an MSF in O(m log m)/p) time using p processors for p ≤ (mlog m)/n(1+log(m/n)). The second algorithm, efficient for dense graphs, requires O(n2/p) time for pn/log n.  相似文献   

6.
Our approach combines the method of inexact steepest descent with the method of contractor directions to obtain an algorithm for solving systems of linear equations. In order to enhance the scope of applicability, we consider an iterative method with variable step-size iterations. We prove the convergence and given an error estimate for our method.

The algorithm is well-suited for parallel computation. In fact, for systems with m equations and n unknowns, each iteration may be computed in parallel time O(log m + log n), on an EREW PRAM with O(mn) processors.  相似文献   


7.
We compare five implementations of the Jacobi method for diagonalizing a symmetric matrix. Two of these, the classical Jacobi and sequential sweep Jacobi, have been used on sequential processors. The third method, the parallel sweep Jacobi, has been proposed as the method of choice for parallel processors. The fourth and fifth methods are believed to be new. They are similar to the parallel sweep method but use different schemes for selecting the rotations.

The classical Jacobi method is known to take O(n4) time to diagonalize a matrix of order n. We find that the parallel sweep Jacobi run on one processor is about as fast as the sequential sweep Jacobi. Both of these methods take O(n3 log2n) time. One of our new methods also takes O(n3 log2n) time, but the other one takes only O(n3) time. The choice among the methods for parallel processors depends on the degree of parallelism possible in the hardware. The time required to diagonalize a matrix on a variety of architectures is modeled.

Unfortunately for proponents of the Jacobi method, we find that the sequential QR method is always faster than the Jacobi method. The QR method is faster even for matrices that are nearly diagonal. If we perform the reduction to tridiagonal form in parallel, the QR method will be faster even on highly parallel systems.  相似文献   


8.
Given n points in the plane the planar dominance counting problem is to determine for each point the number of points dominated by it. Point p is said to dominate point q if x(q)x(p) and y(q)y(p), when x(p) and y(p) are the x− and y-coordinate of p, respectively. We present two CREW PRAM parallel algorithms for the problem, one running in O(log n loglog n) time and and the other in O(lognloglogn/logloglogn) time both using O(n) processors. Some applicationsare also given.  相似文献   

9.
We show that an n-node complete binary tree can be embedded into an n-node linear array such that the maximum edge length is n/log n, and the distribution of the tree nodes is regular in the sense that if each node in the tree array is a finite-state machine and each edge is a communication channel, then the message exchanges (i.e., communication) among the tree nodes can be easily simulated by the linear array which has also finite-state machines for its nodes. The embedding is optimal with respect to the maximum edge length. The motivation for this is the proof of the following result: every cellular (or (log n)-depth iterative) tree array running in T(n) time can be simulated by a linear cellular (or iterative) array in nT(n)/log n time.  相似文献   

10.
A c-vertex-ranking of a graph G for a positive integer c is a labeling of the vertices of G with integers such that, for any label i, deletion of all vertices with labels >i leaves connected components, each having at most c vertices with label i. A c-vertex-ranking is optimal if the number of labels used is as small as possible. We present sequential and parallel algorithms to find an optimal c-vertex-ranking of a partial k-tree, that is, a graph of treewidth bounded by a fixed integer k. The sequential algorithm takes polynomial-time for any positive integer c. The parallel algorithm takes O(log n) parallel time using a polynomial number of processors on the common CRCW PRAM, where n is the number of vertices in G.  相似文献   

11.
A new tridiagonal Toeplitz linear system (TTLS) solver is proposed. The solver first decomposes an n-dimensional strictly diagonally dominant TTLS equation into a number of m-dimensional subsystems employing a modified Gaussian elimination method. An analytic solution of a continued fraction is obtained to derive the solver. The solver based on the modified Gaussian elimination method fully exploits parallelism. Computation and communication complexities of the proposed algorithm are all shown to be O(n/m).  相似文献   

12.
We previously proved that almost all words of length n over a finite alphabet A with m letters contain as factors all words of length k(n) over A as n→∞, provided limsupn→∞ k(n)/log n<1/log m.

In this note it is shown that if this condition holds, then the number of occurrences of any word of length k(n) as a factor into almost all words of length n is at least s(n), where limn→∞ log s(n)/log n=0. In particular, this number of occurrences is bounded below by C log n as n→∞, for any absolute constant C>0.  相似文献   


13.
Given a set of n points on the plane, a symmetric furthest-neighbor (SFN) pair of points p, q is one such that both p and q are furthest from each other among the points in . A pair of points is antipodal if it admits parallel lines of support. In this paper it is shown that a SFN pair of is both a set of extreme points of and an antipodal pair of . It is also shown that an asymmetric furthest-neighbor (ASFN) pair is not necessarily antipodal. Furthermore, if is such that no two distances are equal, it is shown that as many as, and no more than, n/2 pairs of points are SFN pairs. A polygon is unimodal if for each vertex pk, k = 1,…,n the distance function defined by the euclidean distance between pk and the remaining vertices (traversed in order) contains only one local maximum. The fastest existing algorithms for computing all the ASFN or SFN pairs of either a set of points, a simple polygon, or a convex polygon, require 0(n log n) running time. It is shown that the above results lead to an 0(n) algorithm for computing all the SFN pairs of vertices of a unimodal polygon.  相似文献   

14.
15.
An on-line parser processes each word as soon as it is typed by the user, without waiting for the end of the sentence. Thus, in an interactive system, a sentence will be parsed almost immediately after the last word has been presented.

The complexity of an on-line parser is determined by the resources needed for the analysis of a single word, as it is assumed that previous words have been processed already. Sequential parsing algorithms like CYK or Earley need O(n2) time for the nth word. A parallel implementation in O(n) time on O(n) processors is straightforward. In this paper a novel parallel on-line parser is presented that needs O(1) time on O(n2) processors.  相似文献   


16.
A parallel two-list algorithm for the knapsack problem   总被引:10,自引:0,他引:10  
An n-element knapsack problem has 2n possible solutions to search over, so a task which can be accomplished in 2″ trials if an exhaustive search is used. Due to the exponential time in solving the knapsack problem, the problem is considered to be very hard. In the past decade, much effort has been done in order to find techniques which could lead to practical algorithms with reasonable running time. In 1994, Chang et al. proposed a brilliant parallel algorithm, which needs O(2n/8) processors to solve the knapsack problem in O(2n/2) time; that is, the cost of Chang et al.'s parallel algorithm is O(25n/8). In this paper, we propose a parallel algorithm to improve Chang et al.'s parallel algorithm by reducing the time complexity to be O(23n/8) under the same O(2n/8) processors available. Thus, the proposed parallel algorithm has a cost of O(2n/2). It is an improvement over previous literature. We believe that the proposed parallel algorithm is pragmatically feasible at the moment when multiprocessor systems become more and more popular.  相似文献   

17.
We present a fast vector algorithm which solves tridiagonal linear equations by an optimum synthesis of the inherently recursive Gaussian elimination and the parallel though complex cyclic reduction. The idea is to perform an incomplete cyclic reduction to bring the dimension of the tridiagonal system efficiently below a characteristic size n* and then to solve the remaining system by Gaussian elimination. Extensive numerical experiments on the CYBER 205 and the CRAY X-MP computers reveal a maximum vector speedup of 13 and prove n* to reflect the architecture of the vector computer. The performance is further enhanced when a feq right-hand sides are treated simultaneously.  相似文献   

18.
We substantially improve the known algorithms for approximating all the complex zeros of an nth degree polynomial p(x). Our new algorithms save both Boolean and arithmetic sequential time, versus the previous best algorithms of Schönhage [1], Pan [2], and Neff and Reif [3]. In parallel (NC) implementation, we dramatically decrease the number of processors, versus the parallel algorithm of Neff [4], which was the only NC algorithm known for this problem so far. Specifically, under the simple normalization assumption that the variable x has been scaled so as to confine the zeros of p(x) to the unit disc x : |x| ≤ 1, our algorithms (which promise to be practically effective) approximate all the zeros of p(x) within the absolute error bound 2b, by using order of n arithmetic operations and order of (b + n)n2 Boolean (bitwise) operations (in both cases up to within polylogarithmic factors). The algorithms allow their optimal (work preserving) NC parallelization, so that they can be implemented by using polylogarithmic time and the orders of n arithmetic processors or (b + n)n2 Boolean processors. All the cited bounds on the computational complexity are within polylogarithmic factors from the optimum (in terms of n and b) under both arithmetic and Boolean models of computation (in the Boolean case, under the additional (realistic) assumption that n = O(b)).  相似文献   

19.
We present an optimum bit-parallel/word-sequential systolic convolver. Our design is the best one among the previous many convolvers in the sense that its optimality in time and space performances is simultaneously attained without augmenting any global control, broadcasting, initial-data-preloading, and/or multi-sequential or parallel I/O ports which were allowed in most of the previous designs. As an application of our convolver we give a systolic polynomial divider which can compute the polynomial division in exactly n + O(1) optimum steps on [min(nm, m)/2]+O(1) systolic cells for the division of any degree n polynomial by any degree m polynomial (n m).  相似文献   

20.
冯凯  李婧 《计算机应用》2019,39(11):3323-3327
并行计算机系统功能的实现很大程度上依赖于系统互连网络的性能。为了精确度量以kn方体为底层拓扑结构的并行计算机系统的容错能力,研究了点故障模型下kn方体中k元(n-1)方体子网络的可靠性。当k ≥ 3且为奇数时,分别在固定划分模式和灵活划分模式下对kn方体中不同数目的k元(n-1)方体子网络保持无故障状态的平均失效时间进行了分析,并得出了这一子网络可靠性评估参数的计算公式。结果表明,当基于k为奇数的kn方体构建的并行计算机系统指派子网络执行用户任务时,在点故障模型下灵活划分模式相比固定划分模式有着更好的容错能力。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号