期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Design of efficient regular arrays for matrix multiplication bytwo-step regularization

Jong-Chuang Tsay Pen-Yuang Chang 《Parallel and Distributed Systems, IEEE Transactions on》1995,6(2):215-222

A two-step regularization method in which first permutation sequences and then broadcast planes are selected is proposed to design various regular iterative algorithms for matrix multiplication. The regular iterative algorithms are then spacetime mapped to regular arrays, such as mesh, cylindrical, two-layered mesh, and orbital arrays. The proposed method can be used to design regular arrays with execution time of less than N (problem size) 相似文献

2.

Designing linear systolic arrays

V. K. Prasanna Kumar Yu-Chen Tsai 《Journal of Parallel and Distributed Computing》1989,7(3)

We develop a simple mapping technique to design linear systolic arrays. The basic idea of our technique is to map the computations of a certain class of two-dimensional systolic arrays onto one-dimensional arrays. Using this technique, systolic algorithms are derived for problems such as matrix multiplication and transitive closure on linearly connected arrays of PEs with constant I/O bandwidth. Compared to known designs in the literature, our technique leads to modular systolic arrays with constant hardware in each PE, few control lines, lexicographic data input/output, and improved delay time. The unidirectional flow of control and data in our design assures implementation of the linear array in the known fault models of Wafer Scale Integration. 相似文献

3.

On mapping systolic algorithms onto the hypercube

Ibarra O.H. Sohn S.M. 《Parallel and Distributed Systems, IEEE Transactions on》1990,1(1):48-63

Consideration is given to the problem of mapping systolic array algorithms into efficient algorithms for a fixed-size hypercube architecture. The authors describe in detail several optimal implementations of algorithms given for one-way one- and two-dimensional systolic arrays. Since interprocessor communication is many times slower than local computation in parallel computers built to date, the problem of efficient communication is specifically addressed for these mappings. In order to validate the technique experimentally, five systolic algorithms were mapped in various ways onto a 64-node NCUBE/7 MIMD hypercube machine. The algorithms are for the following problems: the shuffle scheduling problem, finite impulse response filtering, linear context-free language recognition, matrix multiplication, and computing the Boolean transitive closure. Experimental evidence indicates that good performance is obtained for the mappings 相似文献

4.

A modular systolic linearization of the Warshall-Floyd algorithm

Myoupo J.F. Fabret A.C. 《Parallel and Distributed Systems, IEEE Transactions on》1996,7(5):449-455

In this paper, we use a variant of the geometric method to derive efficient modular linear systolic algorithms for the transitive closure and shortest path problems. Furthermore, we show that partially-pipelined modular linear systolic algorithms with an output operation, for matrix multiplication, can be as fast as the fully-pipelined existing ones and, moreover, they need less cells 相似文献

5.

Computing transitive closure on systolic arrays of fixed size

Björn Lisper 《Distributed Computing》1991,5(3):133-144

Summary Forming the transitive closure of a binary relation (or directed graph) is an important part of many algorithms. When the relation is represented by a bit matrix, the transitive closure can be efficiently computed in parallel in a systolic array.Here we propose two novel ways of computing the transitive closure of an arbitrarily big graph on a systolic array of fixed size. The first method is a simple partitioning of a well-known systolic algorithm for computing the transitive closure. The second is a block-structured algorithm. This algorithm is suitable for execution on a systolic array that can multiply fixed size bit matrices and compute transitive closure of graphs with a fixed number of nodes. The algorithm is, however, not limited to systolic array implementations; it works onany parallel architecture that can perform these bit matrix operatons efficiently.The shortest path problem, for directed graphs with weighted edges, can also be solved in the same manner, devised above, as the transitive closure is computed. Björn Lisper was born in 1956 in Solna, Sweden. He received the M. Eng. Physics degree in 1980 and the Ph.D. degree in Computer Science in 1987, both from the Royal Institute of Technology in Stockholm. Currently he shares his time between the Royal Institute of Technology and the Swedish Institute of Computer Science. His research interests are mainly in the area of formal methods for deriving efficient parallel implementations of algorithms, including synthesis of fixed hardware structures for specific algorithms and compilation techniques for tightly coupled parallel systems. Dr. Lisper is a member of the European Association for Theoretical Computer Science. 相似文献

6.

部分三维算法的最优线性阵列设计

杨超峰胡铭曾《计算机研究与发展》1999,36(6):751-757

自８０年代末,处理阵列的研究的一个新方向是设计线性阵列。在这方面,Ｌｅｅ和Ｋｅｄｅｍ作出开创性的工作,他们提出了一个线性阵列设计框架,但是,目前还没有一个有效的设计方法。在文中,提出了一个线性阵列的设计方法,基于它,线性阵列的设计者通过分析算法对应的数据依赖图的最长路径,就可以获得可行的设计。相似文献

7.

A survey of parallel execution strategies for transitive closure and logic programs

Filippo Cacace Stefano Ceri Maurice Houtsma 《Distributed and Parallel Databases》1993,1(4):337-382

An important feature of database technology of the nineties is the use of parallelism for speeding up the execution of complex queries. This technology is being tested in several experimental database architectures and a few commercial systems for conventional select-project-join queries. In particular, hash-based fragmentation is used to distribute data to disks under the control of different processors in order to perform selections and joins in parallel. With the development of new query languages, and in particular with the definition of transitive closure queries and of more general logic programming queries, the new dimension of recursion has been added to query processing. Recursive queries are complex; at the same time, their regular structure is particularly suited for parallel execution, and parallelism may give a high efficiency gain. We survey the approaches to parallel execution of recursive queries that have been presented in the recent literature. We observe that research on parallel execution of recursive queries is separated into two distinct subareas, one focused on the transitive closure of Relational Algebra expressions, the other one focused on optimization of more general Datalog queries. Though the subareas seem radically different because of the approach and formalism used, they have many common features. This is not surprising, because most typical Datalog queries can be solved by means of the transitive closure of simple algebraic expressions. We first analyze the relationship between the transitive closure of expressions in Relational Algebra and Datalog programs. We then review sequential methods for evaluating transitive closure, distinguishing iterative and direct methods. We address the parallelization of these methods, by discussing various forms of parallelization. Data fragmentation plays an important role in obtaining parallel execution; we describe hash-based and semantic fragmentation. Finally, we consider Datalog queries, and present general methods for parallel rule execution; we recognize the similarities between these methods and the methods reviewed previously, when the former are applied to linear Datalog queries. We also provide a quantitative analysis that shows the impact of the initial data distribution on the performance of methods. Recommended by: Patrick Valduriez 相似文献

8.

面向异构架构的传递闭包并行算法

肖汉郭宝云李彩林周清雷《计算机工程》2021,47(8):131-139

传统求图传递闭包的方法存在计算量大与计算时间长的问题。为加快处理大数据量的传递闭包算法的计算速度,结合算法密集计算和开放式计算语言（OpenCL）框架的特征,采用本地存储器优化的并行子矩阵乘和分块的矩阵乘并行计算,提出一种基于OpenCL的传递闭包并行算法。利用本地存储器优化的并行子矩阵乘算法来优化计算步骤,提高图形处理器（GPU）的存储器利用率,降低数据获取延迟。通过分块矩阵乘并行计算算法实现大数据量的矩阵乘,提高GPU计算核心的利用率。数据结果表明,与CPU串行算法、基于开放多处理的并行算法和基于统一设备计算架构的并行算法相比,传递闭包并行算法在OpenCL架构下NVIDIA GeForce GTX 1070计算平台上分别获得了593.14倍、208.62倍和1.05倍的加速比。相似文献

9.

Constant time algorithms for the transitive closure and somerelated graph problems on processor arrays with reconfigurable bussystems

Wang B.-F. Chen G.-H. 《Parallel and Distributed Systems, IEEE Transactions on》1990,1(4):500-507

The transitive closure problem in O(1) time is solved by a new method that is far different from the conventional solution method. On processor arrays with reconfigurable bus systems, two O (1) time algorithms are proposed for computing the transitive closure of an undirected graph. One is designed on a three-dimensional n×n×n processor array with a reconfigurable bus system, and the other is designed on a two-dimensional n²×n² processor array with a reconfigurable bus system, where n is the number of vertices in the graph. Using the O(1) time transitive closure algorithms, many other graph problems are solved in O(1) time. These problems include recognizing bipartite graphs and finding connected components, articulation points, biconnected components, bridges, and minimum spanning trees in undirected graphs 相似文献

10.

Constant Time Boolean Matrix Multiplication on a Linear Array with a Reconfigurable Pipelined Bus System

Li Keqin 《The Journal of supercomputing》1997,11(4):391-403

We show that the product of two N × N boolean matrices can be calculated in constant time on an LARPBS with O(N3 / log N) processors. All data communications and computations are performed on the bit level. To the best of the author's knowledge, this is the first parallel boolean matrix multiplication algorithm that has constant execution time, and is executed on a distributed memory system with (N3) processors. By using our boolean matrix multiplication algorithm, it is shown that the transitive closure of a directed graph can be obtained in O(log N) time ( measured by bit level operations) on an LARPBS with O (N3 / log N) processors. To the best of our knowledge, this is the first parallel algorithm for tansitive closure of directed graphs with time complexity O(log N) (comparable to that of CRCW PRAM) and cost O (N3) on a realistic parallel computing model, which has no shared memory, and interprocessor communications are dealt with explicitly and efficiently. 相似文献

11.

An efficient database transitive closure algorithm

I. H. Toroslu G. Z. Qadah L. Henschen 《Applied Intelligence》1994,4(2):205-218

The integration of logic rules and relational databases has recently emerged as an important technique for developing knowledge management systems. An important class of logic rules utilized by these systems is the so-called transitive closure rules, the processing of which requires the computation of the transitive closure of database relations referenced by these rules. This article presents a new algorithm suitable for computing the transitive closure of very large database relations. This algorithm proceeds in two phases. In the first phase, a general graph is condensed into an acyclic one, and at the same time a special sparse matrix is formed from the acyclic graph. The second phase is the main one, in which all the page I/O operations are minimized by removing most of the redundant operations that appear in previous algorithms. Using simulation, this article also studies and examines the performance of this algorithm and compares it with the previous algorithms. 相似文献

12.

The design and time analysis of a systolic array with asynchronous protocols for matrix multiplication

Moon S. Jun 《Computers & Electrical Engineering》1989,15(3-4):107-120

In this paper, we present new techniques for designing systolic arrays and asynchronous arrays for recursive algorithms. More specifically, we propose a systolic array with simple local interconnections for matrix multiplication which achieves optimal performance without having undesirable features such as preloading input data or global broadcasting. An asynchronous array for matrix multiplication which can speed up the total computation time significantly is also presented. The key component of the asynchronous array is a communication protocol which controls input data flow properly and efficiently. Finally, performance of the arrays is analyzed and a simulation using Occam programmed in a Transputer network is reported. 相似文献

13.

Computational experiences with some transitive closure algorithms

Dr. M. M. Sysło Mgr. J. Dzikiewicz 《Computing》1975,15(1):33-39

The paper contains results of computational experiences with the following algorithms for finding the transitive closure of a digraph: (i) Warshall's algorithm [17], (ii) Purdom's algorithm [13], (iii) the modification of Yen's algorithm [14], and (iv) the new algorithms for finding the transitive closure [3, 4]. The tested digraphs were generated at random. The enclosed references contain all papers known to the authors concerning transitive closure algorithms. 相似文献

14.

On high-speed computing with a programmable linear array

Peizong Lee Zvi M. Kedem 《The Journal of supercomputing》1990,4(3):223-249

It has been observed by many researchers that systolic arrays are very suitable for certain high-speed computations. Using a formal methodology, we present a design for a single simple programmable linear systolic array capable of solving large numbers of problems drawn from a variety of applications. The methodology is applicable to problems solvable by sequential algorithms that can be specified as nested for-loops of arbitrary depth. The algorithms of this form that can be computed on the array presented in this paper include 25 algorithms dealing with signal and image processing, algebraic computations, matrix arithmetic, pattern matching, database operations, sorting, and transitive closure. Assuming bounded I/O, for 18 of those algorithms the time and storage complexities are optimal, and therefore no improvement can be expected by using dedicated special-purpose linear systolic arrays designed for individual algorithms. We also describe another design which, using a sufficient large local memory and allowing data to be preloaded and unloaded, has an optimal processor/time product.An earlier version of this paper was presented at Supercomputing '88.This work was partially supported by ONR under the contract N00014-85-K-0046 and by NSF under Grant Number CCR-8906949. 相似文献

15.

Algorithms for computing the optimal transitive approximation of a proximity relation

Guan Nan Deng Kai Hu Hong Xing Li 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2011,15(5):1023-1038

Three ways for generating the optimal transitive approximations or a suboptimal transitive approximation are given in this paper. The first one can obtain all the optimal transitive approximations for any proximity relation. However, trying to find all the optimal transitive approximations can be very expensive. The second one gives a method to obtain a suboptimal transitive approximation which can frequently generate an optimal transitive approximation. Furthermore, starting from the transitive closure the third method is proposed which can obtain a locally optimal transitive approximation. Finally, numerical experiments are carried out to show the abilities of these algorithms and compare them to other existing approximation algorithms. 相似文献

16.

Fast Dynamic Transitive Closure with Lookahead

Piotr Sankowski Marcin Mucha 《Algorithmica》2010,56(2):180-197

In this paper we consider the problem of dynamic transitive closure with lookahead. We present a randomized one-sided error algorithm with updates and queries in O(n ^{ω(1,1,ε)−ε}) time given a lookahead of n ^ε operations, where ω(1,1,ε) is the exponent of multiplication of n×n matrix by n×n ^ε matrix. For ε≤0.294 we obtain an algorithm with queries and updates in O(n ^2−ε) time, whereas for ε=1 the time is O(n ^ω−1). This is essentially optimal as it implies an O(n ^ω) algorithm for boolean matrix multiplication. We also consider the offline transitive closure in planar graphs. For this problem, we show an algorithm that requires O(n^\fracw2)O(n^{\frac{\omega}{2}}) time to process n^\frac12n^{\frac{1}{2}} operations. We also show a modification of these algorithms that gives faster amortized queries. Finally, we give faster algorithms for restricted type of updates, so called element updates. All of the presented algorithms are randomized with one-sided error. All our algorithms are based on dynamic algorithms with lookahead for matrix inverse, which are of independent interest. 相似文献

17.

Solving an algebraic path problem and some related graph problemson a hyper-bus broadcast network

Horng-Ren Tsai Shi-Jinn Horng Shun-Shan Tsai Tzong-Wann Kao Shung-Shing Lee 《Parallel and Distributed Systems, IEEE Transactions on》1997,8(12):1226-1235

The parallel computation model upon which the proposed algorithms are based is the hyper-bus broadcast network. The hyper-bus broadcast network consists of processors which are connected by global buses only. Based on such an improved architecture, we first design two O(1) time basic operations for finding the maximum and minimum of N numbers each of size O(log N)-bit and computing the matrix multiplication operation of two N×N matrices, respectively. Then, based on these two basic operations, three of the most important instances in the algebraic path problem, the connectivity problem, and several related problems are all solved in O(log N) time. These include the all-pair shortest paths, the minimum-weight spanning tree, the transitive closure, the connected component, the biconnected component, the articulation point, and the bridge problems, either in an undirected or a directed graph, respectively 相似文献

18.

Parity,circuits, and the polynomial-time hierarchy

Merrick Furst James B. Saxe Michael Sipser 《Theory of Computing Systems》1984,17(1):13-27

A super-polynomial lower bound is given for the size of circuits of fixed depth computing the parity function. Introducing the notion of polynomial-size, constant-depth reduction, similar results are shown for the majority, multiplication, and transitive closure functions. Connections are given to the theory of programmable logic arrays and to the relativization of the polynomial-time hierarchy.Research partially funded by NSF Grant MCS-81-05555 and ONR Grant N00014-76-C-0370. 相似文献

19.

All-to-all broadcast and matrix multiplication in faulty SIMDhypercubes

Sengupta A. Raghavendra C.S. 《Parallel and Distributed Systems, IEEE Transactions on》1998,9(6):550-560

In this paper, we develop algorithms in order of efficiency for all-to-all broadcast problem in an N=2ⁿ-node n-dimensional faulty SIMD hypercube, Q_n, with up to n-1 node faults. The algorithms use a property of a certain ordering of dimensions. Our analysis includes startup time (α) and transfer time (β). We have established the lower bound for such an algorithm to be nα+(2N-3)Lβ in a faulty hypercube with at most n-1 faults (each node has a value of L bytes). Our best algorithm requires 2nα+2NLβ and is near-optimal. We develop an optimal algorithm for matrix multiplication in a faulty hypercube using all-to-all broadcast and compare the efficiency of all-to-all broadcast approach with broadcast approach and global sum approach for matrix multiplication. The algorithms are congestion-free and applicable in the context of available hypercube machines 相似文献

20.

矩阵乘法在斐波那契数列计算中的应用

周卫星陈思张帆《广东电脑与电讯》2017,1(9):71-73

本文介绍了斐波那契数列的一些算法思路,对递归算法、自底向上、比内公式等算法的时间复杂度进行了分析,给出了利用矩阵乘法升维计算降低时间复杂度的方法,对比测试了各算法实现在不同计算量下的执行时间。针对数据溢出,将long数据类型改进为BigInteger的数据类型,给出了大数计算下的执行时间对比。相似文献