首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper, we address the matrix chain multiplication problem, i.e., the multiplication of several matrices. Although several studies have investigated the problem, our approach has some different points. First, we propose MapReduce algorithms that allow us to provide scalable computation for large matrices. Second, we transform the matrix chain multiplication problem from sequential multiplications of two matrices into a single multiplication of several matrices. Since matrix multiplication is associative, this approach helps to improve the performance of the algorithms. To implement the idea, we adopt multi-way join algorithms in MapReduce that have been studied in recent years. In our experiments, we show that the proposed algorithms are fast and scalable, compared to several baseline algorithms.  相似文献   

2.
In this paper we present some parallel algorithms for matrix addition, matrix multiplication, Gaussian elimination, and other related computations on sparse matrices. Our algorithms are designed for the hypercube and related networks, but they can be easily implemented on any other local memory machine. We prove that, under certain assumptions, on a hypercube or related network with p processors our algorithms achieve a speedup proportional to p/log p.  相似文献   

3.
In this paper we describe a complete solution for the first challenge of the VerifyThis 2016 competition held at the 18th ETAPS Forum. We present the proof of two variants for the multiplication of matrices: a naive version using three nested loops and Strassen’s algorithm. The proofs are conducted using the Why3 platform for deductive program verification and automated theorem provers to discharge proof obligations. In order to specify and prove the two multiplication algorithms, we develop a new Why3 theory of matrices. In order to prove the matrix identities on which Strassen’s algorithm is based, we apply the proof by reflection methodology, which we implement using ghost state.To our knowledge, this is the first time such a methodology is used under an auto-active setting.  相似文献   

4.
Parallel factor analysis (PARAFAC) is a tensor (multiway array) factorization method which allows to find hidden factors (component matrices) from a multidimensional data. Most of the existing algorithms for the PARAFAC, especially the alternating least squares (ALS) algorithm need to compute Khatri-Rao products of tall factors and multiplication of large matrices, and due to this require high computational cost and large memory and are not suitable for very large-scale-problems. Hence, PARAFAC for large-scale data tensors is still a challenging problem. In this paper, we propose a new approach based on a modified ALS algorithm which computes Hadamard products, instead Khatri-Rao products, and employs relatively small matrices. The new algorithms are able to process extremely large-scale tensors with billions of entries. Extensive experiments confirm the validity and high performance of the developed algorithm in comparison with other well-known algorithms.  相似文献   

5.
Fail-stop failures in distributed environments are often tolerated by checkpointing or message logging. In this paper, we show that fail-stop process failures in ScaLAPACK matrix-matrix multiplication kennel can be tolerated without checkpointing or message logging. It has been proved in previous algorithm-based fault tolerance that, for matrix-matrix multiplication, the checksum relationship in the input checksum matrices is preserved at the end of the computation no mater which algorithm is chosen. From this checksum relationship in the final computation results, processor miscalculations can be detected, located, and corrected at the end of the computation. However, whether this checksum relationship can be maintained in the middle of the computation or not remains open. In this paper, we first demonstrate that, for many matrix matrix multiplication algorithms, the checksum relationship in the input checksum matrices is not maintained in the middle of the computation. We then prove that, however, for the outer product version algorithm, the checksum relationship in the input checksum matrices can be maintained in the middle of the computation. Based on this checksum relationship maintained in the middle of the computation, we demonstrate that fail-stop process failures (which are often tolerated by checkpointing or message logging) in ScaLAPACK matrix-matrix multiplication can be tolerated without checkpointing or message logging.  相似文献   

6.
在许多应用领域中,大规模浮点矩阵乘法往往是最耗时的计算核心之一。在新兴的应用中经常存在至少有一个维度很小的大规模矩阵,我们把具备这种特性的矩阵称为非均匀矩阵。由于FPGA上用以存储中间结果的片上存储器容量十分有限,计算大规模矩阵乘法时往往需要将矩阵划分成细粒度的子块计算任务。当加速非均匀矩阵乘法时,由于只支持固定分块大小,大多数现有的线性阵列结构的硬件矩阵乘法器将遭受很大的性能下降。为了解决这个问题,提出了一种有效的优化分块策略。在此基础上,在Xilinx公司的Zynq XC7Z045FPGA芯片上实现了一个支持可变分块的矩阵乘法器。通过集成224个处理单元,该矩阵乘法器在150 MHz的时钟频率下对于实际应用中的非均匀矩乘达到了48GFLOPS的实测性能,而所需带宽仅为4.8GB/s。实验结果表明,我们提出的分块策略相比于传统的分块算法实现了高达12%的性能提升。  相似文献   

7.
This note presents simple algorithms for obtaining the solutions of the Diophantine equation. Our methods can produce classes of all solutions with lower degree than a specified number. The previous algorithms involve some troublesome computations, e.g., the calculation of both the controllability indexes and the observability indexes or the solution of a pole assignment problem, etc. Our contribution is that our algorithm requires only basic matrix operations such as addition, subtraction, multiplication, and inversion of given real matrices. In addition, by solving simple linear equations, the class of all minimum degree solutions can be given. Therefore the computational efforts are reduced compared with previous algorithms  相似文献   

8.
9.
10.
11.
In this paper, we present several new and generalized parallel dense matrix multiplication algorithms of the form C = αAB + β C on two-dimensional process grid topologies. These algorithms can deal with rectangular matrices distributed on rectangular grids. We classify these algorithms coherently into three categories according to the communication primitives used and thus we offer a taxonomy for this family of related algorithms. All these algorithms are represented in the data distribution independent approach and thus do not require a specific data distribution for correctness. The algorithmic compatibility condition result shown here ensures the correctness of the matrix multiplication. We define and extend the data distribution functions and introduce permutation compatibility and algorithmic compatibility. We also discuss a permutation compatible data distribution (modified virtual 2D data distribution). We conclude that no single algorithm always achieves the best performance on different matrix and grid shapes. A practical approach to resolve this dilemma is to use poly-algorithms. We analyze the characteristics of each of these matrix multiplication algorithms and provide initial heuristics for using the poly-algorithm. All these matrix multiplication algorithms have been tested on the IBM SP2 system. The experimental results are presented in order to demonstrate their relative performance characteristics, motivating the combined value of the taxonomy and new algorithms introduced here. © 1997 by John Wiley & Sons, Ltd.  相似文献   

12.
The applicability of fast multiplication algorithms to sparse structures is discussed. Estimates for the degree of sparseness of matrices and polynomials are given for which fast multiplication algorithms have advantages over standard multiplication algorithms in terms of the multiplicative complexity. Specifically, the Karatsuba and Strassen algorithms are studied under the assumption of the uniform distribution of zero elements.  相似文献   

13.
Zhou  Jukai  Liu  Tong  Zhu  Jingting 《Multimedia Tools and Applications》2019,78(23):33415-33434

K-means clustering is one of the most popular clustering algorithms and has been embedded in other clustering algorithms, e.g. the last step of spectral clustering. In this paper, we propose two techniques to improve previous k-means clustering algorithm by designing two different adjacent matrices. Extensive experiments on public UCI datasets showed the clustering results of our proposed algorithms significantly outperform three classical clustering algorithms in terms of different evaluation metrics.

  相似文献   

14.
Exascale computers are expected to have highly hierarchical architectures with nodes composed by multiple core processors (CPU; central processing unit) and accelerators (GPU; graphics processing unit). The different programming levels generate new difficult algorithm issues. In particular when solving extremely large linear systems, new programming paradigms of Krylov methods should be defined and evaluated with respect to modern state of the art of scientific methods. Iterative Krylov methods involve linear algebra operations such as dot product, norm, addition of vectors and sparse matrix–vector multiplication. These operations are computationally expensive for large size matrices. In this paper, we aim to focus on the best way to perform effectively these operations, in double precision, on GPU in order to make iterative Krylov methods more robust and therefore reduce the computing time. The performance of our algorithms is evaluated on several matrices arising from engineering problems. Numerical experiments illustrate the robustness and accuracy of our implementation compared to the existing libraries. We deal with different preconditioned Krylov methods: Conjugate Gradient for symmetric positive-definite matrices, and Generalized Conjugate Residual, Bi-Conjugate Gradient Conjugate Residual, transpose-free Quasi Minimal Residual, Stabilized BiConjugate Gradient and Stabilized BiConjugate Gradient (L) for the solution of sparse linear systems with non symmetric matrices. We consider and compare several sparse compressed formats, and propose a way to implement effectively Krylov methods on GPU and on multicore CPU. Finally, we give strategies to faster algorithms by auto-tuning the threading design, upon the problem characteristics and the hardware changes. As a conclusion, we propose and analyse hybrid sub-structuring methods that should pave the way to exascale hybrid methods.  相似文献   

15.
该文主要介绍基于流水光总线的可重构线性阵列系统(LARPBS)模型及其基本数据传输和操作,并以矩阵乘法和排序为例介绍了LARPBS上的并行算法及其设计方法。  相似文献   

16.
We study the performance of algebraic operations on large sparse matrices stored on secondary storage and show how the traditional algorithms can be fine-tuned in order to minimize the number of page accesses. We develop cost equations for performing multiplication, transposition, and Gaussian elimination on various secondary storage schemes for sparse matrices and show how these can be incorporated into a selection model which chooses the optimal sequence of storage schemes for a given mix of operations. Furthermore, we present the results of a number of experiments and compare our analytical results with experimental results obtained on synthetically generated data.This work was supported in part by NASA under grant number NAG-1-348.  相似文献   

17.
In this paper, we develop and evaluate two new algorithms for checking emptiness of alternating automata. These algorithms build on previous works. First, they rely on antichains to efficiently manipulate the state-spaces underlying the analysis of alternating automata. Second, they are abstract algorithms with built-in refinement operators based on techniques that exploit information computed by abstract fixed points (and not counter-examples as it is usually the case). The efficiency of our new algorithms is illustrated by experimental results.  相似文献   

18.
We present a new static analysis by abstract interpretation to prove automatically the functional correctness of algorithms implementing matrix operations, such as matrix addition, multiplication, general matrix multiplication, inversion, or more generally Basic Linear Algebra Subprograms. In order to do so, we introduce a family of abstract domains parameterized by a set of matrix predicates as well as a numerical domain. We show that our analysis is robust enough to prove the functional correctness of several versions of the same matrix operations, resulting from loop reordering, loop tiling, inverting the iteration order, line swapping, and expression decomposition. We extend our method to enable modular analysis on code fragments manipulating matrices by reference, and show that it results in a significant analysis speedup.  相似文献   

19.
In this paper we present four discrete versions of two different existing honey bee optimization algorithms: the discrete artificial bee colony algorithm (DABC) and three versions of the discrete fast marriage in honey bee optimization algorithm (DFMBO1, DFMBO2, and DFMBO3). In these discretized algorithms we have utilized three logical operators, i.e. OR, AND and XOR operators. Then we have compared performances of our algorithms and those of three other bee algorithms, i.e. the artificial bee colony (ABC), the queen bee (QB), and the fast marriage in honey bee optimization (FMBO) on four benchmark functions for various numbers of variables up to 100. The obtained results show that our discrete algorithms are faster than other algorithms. In general, when precision of answer and number of variables are low, the difference between our new algorithms and the other three algorithms is small in terms of speed, but by increasing precision of answer and number of variables, the needed number of function evaluations for other algorithms increases beyond manageable amounts, hence their success rates decrease. Among our proposed discrete algorithms, the DFMBO3 is always fast, and achieves a success rate of 100% on all benchmarks with an average number of function evaluations not more than 1010.  相似文献   

20.
This is the first part of two papers that use concepts from graph theory to obtain a deeper understanding of the mathematical foundations of multibody dynamics. The key contribution is the development of a unifying framework that shows that key analytical results and computational algorithms in multibody dynamics are a direct consequence of structural properties and require minimal assumptions about the specific nature of the underlying multibody system. This first part focuses on identifying the abstract graph theoretic structural properties of spatial operator techniques in multibody dynamics. The second part paper exploits these structural properties to develop a broad spectrum of analytical results and computational algorithms.Towards this, we begin with the notion of graph adjacency matrices and generalize it to define block-weighted adjacency (BWA) matrices and their 1-resolvents. Previously developed spatial operators are shown to be special cases of such BWA matrices and their 1-resolvents. These properties are shown to hold broadly for serial and tree topology multibody systems. Specializations of the BWA and 1-resolvent matrices are referred to as spatial kernel operators (SKO) and spatial propagation operators (SPO). These operators and their special properties provide the foundation for the analytical and algorithmic techniques developed in the companion paper.We also use the graph theory concepts to study the topology induced sparsity structure of these operators and the system mass matrix. Similarity transformations of these operators are also studied. While the detailed development is done for the case of rigid-link multibody systems, the extension of these techniques to a broader class of systems (e.g. deformable links) are illustrated.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号