首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper, we present several new and generalized parallel dense matrix multiplication algorithms of the form C = αAB + β C on two-dimensional process grid topologies. These algorithms can deal with rectangular matrices distributed on rectangular grids. We classify these algorithms coherently into three categories according to the communication primitives used and thus we offer a taxonomy for this family of related algorithms. All these algorithms are represented in the data distribution independent approach and thus do not require a specific data distribution for correctness. The algorithmic compatibility condition result shown here ensures the correctness of the matrix multiplication. We define and extend the data distribution functions and introduce permutation compatibility and algorithmic compatibility. We also discuss a permutation compatible data distribution (modified virtual 2D data distribution). We conclude that no single algorithm always achieves the best performance on different matrix and grid shapes. A practical approach to resolve this dilemma is to use poly-algorithms. We analyze the characteristics of each of these matrix multiplication algorithms and provide initial heuristics for using the poly-algorithm. All these matrix multiplication algorithms have been tested on the IBM SP2 system. The experimental results are presented in order to demonstrate their relative performance characteristics, motivating the combined value of the taxonomy and new algorithms introduced here. © 1997 by John Wiley & Sons, Ltd.  相似文献   

2.
New hybrid algorithms for matrix multiplication are proposed that have the lowest computational complexity in comparison with well-known matrix multiplication algorithms. Based on the proposed algorithms, efficient algorithms are developed for the basic operation \( D = C + \sum\limits_{l =1}^{\xi} A_{l} B_{l}\) of cellular methods of linear algebra, where A, B, and D are square matrices of cell size. The computational complexity of the proposed algorithms is estimated.  相似文献   

3.
New hybrid algorithms are proposed for multiplying (n × n) matrices. They are based on Laderman’s algorithm for multiplying (3 × 3)-matrices. As compared with well-known hybrid matrix multiplication algorithms, the new algorithms are characterized by the minimum computational complexity. The multiplicative, additive, and overall complexities of the algorithms are estimated.  相似文献   

4.
A mixed cellular method of matrix multiplication is proposed that combines the Strassen method with a fast cellular method of matrix multiplication. The interaction of these methods makes it possible to decrease the multiplicative and additive complexities of well-known matrix multiplication algorithms by 25%. Estimates of computational complexity of cellular analogues of the mentioned algorithms are given. Translated from Kibernetika i Sistemnyi Analiz, No. 1, pp. 22–27, January–February 2009.  相似文献   

5.
Efficient covariance matrix update for variable metric evolution strategies   总被引:2,自引:0,他引:2  
Randomized direct search algorithms for continuous domains, such as evolution strategies, are basic tools in machine learning. They are especially needed when the gradient of an objective function (e.g., loss, energy, or reward function) cannot be computed or estimated efficiently. Application areas include supervised and reinforcement learning as well as model selection. These randomized search strategies often rely on normally distributed additive variations of candidate solutions. In order to efficiently search in non-separable and ill-conditioned landscapes the covariance matrix of the normal distribution must be adapted, amounting to a variable metric method. Consequently, covariance matrix adaptation (CMA) is considered state-of-the-art in evolution strategies. In order to sample the normal distribution, the adapted covariance matrix needs to be decomposed, requiring in general Θ(n 3) operations, where n is the search space dimension. We propose a new update mechanism which can replace a rank-one covariance matrix update and the computationally expensive decomposition of the covariance matrix. The newly developed update rule reduces the computational complexity of the rank-one covariance matrix adaptation to Θ(n 2) without resorting to outdated distributions. We derive new versions of the elitist covariance matrix adaptation evolution strategy (CMA-ES) and the multi-objective CMA-ES. These algorithms are equivalent to the original procedures except that the update step for the variable metric distribution scales better in the problem dimension. We also introduce a simplified variant of the non-elitist CMA-ES with the incremental covariance matrix update and investigate its performance. Apart from the reduced time-complexity of the distribution update, the algebraic computations involved in all new algorithms are simpler compared to the original versions. The new update rule improves the performance of the CMA-ES for large scale machine learning problems in which the objective function can be evaluated fast.  相似文献   

6.
信道编码MIMO系统需要检测器具有软输入软输出特性,而常规的检测算法通常具有很高的计算复杂度,阻碍了其在实际中的应用.提出一种低复杂度MIMO检测方案.首次迭代中,利用低复杂度快速矩阵和分解方案来获得MMSE检测输出,避免了常规矩阵和求逆中的Jordan标准型化简;其余迭代中,利用信道解码器提供的软信息将MIMO系统转...  相似文献   

7.
In this paper, we propose new adaptive algorithms for the extraction and tracking of the least (minor) or eventually, principal eigenvectors of a positive Hermitian covariance matrix. The main advantage of our proposed algorithms is their low computational complexity and numerical stability even in the minor component analysis case. The proposed algorithms are considered fast in the sense that their computational cost is O(np) flops per iteration where n is the size of the observation vector and p<n is the number of eigenvectors to estimate.We consider OJA-type minor component algorithms based on the constraint and non-constraint stochastic gradient technique. Using appropriate fast orthogonalization procedures, we introduce new fast algorithms that extract the minor (or principal) eigenvectors and guarantee good numerical stability as well as the orthogonality of their weight matrix at each iteration. In order to have a faster convergence rate, we propose a normalized version of these algorithms by seeking the optimal step-size. Our algorithms behave similarly or even better than other existing algorithms of higher complexity as illustrated by our simulation results.  相似文献   

8.
An interval algebra (IA) has been proposed as a model for representing and reasoning about qualitative temporal relations between time intervals. Unfortunately, reasoning tasks with IA that involve deciding the satisfiability of the temporal constraints, or providing all the satisfying instances of the temporal constraints, areNP-complete. That is, solving these problems are computationally exponential in the worst case. However, several directions in improving their computational performance are still possible. This paper presents a new backtracking algorithm for finding a solution called consistent scenario. This algorithm has anO(n 3) best-case complexity, compared toO(n 4) of previous known backtrack algorithms, wheren denotes the number of intervals. By computational experiments, we tested the performance of different backtrack algorithms on a set of randomly generated networks with the results favoring our proposal. In this paper, we also present a new path consistency algorithm, which has been used for finding approximate solutions towards the minimal labeling networks. The worst-case complexity of the proposed algorithm is stillO(n 3); however, we are able to improve its performance by eliminating the unnecessary duplicate computation as presented in Allen's original algorithm, and by employing a most-constrained first principle, which ensures a faster convergence. The performance of the proposed scheme is evaluated through a large set of experimental data.  相似文献   

9.
Level set methods [Osher and Sethian. Fronts propagating with curvature-dependent speed: algorithms based on Hamilton–Jacobi formulations. J. Comput. Phys. 79 (1988) 12] have proved very successful for interface tracking in many different areas of computational science. However, current level set methods are limited by a poor balance between computational efficiency and storage requirements. Tree-based methods have relatively slow access times, whereas narrow band schemes lead to very large memory footprints for high resolution interfaces. In this paper we present a level set scheme for which both computational complexity and storage requirements scale with the size of the interface. Our novel level set data structure and algorithms are fast, cache efficient and allow for a very low memory footprint when representing high resolution level sets. We use a time-dependent and interface adapting grid dubbed the “Dynamic Tubular Grid” or DT-Grid. Additionally, it has been optimized for advanced finite difference schemes currently employed in accurate level set computations. As a key feature of the DT-Grid, the associated interface propagations are not limited to any computational box and can expand freely. We present several numerical evaluations, including a level set simulation on a grid with an effective resolution of 10243  相似文献   

10.
This paper proposes a cellular method of matrix multiplication. The method reduces the multiplicative and additive complexities of well-known matrix multiplication algorithms by 12.5%. The computational complexities of cellular analogs of such algorithms are estimated. A fast cellular analog is presented whose multiplicative and additive complexities are equal to ≈0.382n3 multiplications and ≈1.147n3 additions, respectively, where n is the order of the matrices being multiplied. __________ Translated from Kibernetika i Sistemnyi Analiz, No. 3, pp. 55–59, May–June 2008.  相似文献   

11.
We present graphics processing unit (GPU) data structures and algorithms to efficiently solve sparse linear systems that are typically required in simulations of multi‐body systems and deformable bodies. Thereby, we introduce an efficient sparse matrix data structure that can handle arbitrary sparsity patterns and outperforms current state‐of‐the‐art implementations for sparse matrix vector multiplication. Moreover, an efficient method to construct global matrices on the GPU is presented where hundreds of thousands of individual element contributions are assembled in a few milliseconds. A finite‐element‐based method for the simulation of deformable solids as well as an impulse‐based method for rigid bodies are introduced in order to demonstrate the advantages of the novel data structures and algorithms. These applications share the characteristic that a major computational effort consists of building and solving systems of linear equations in every time step. Our solving method results in a speed‐up factor of up to 13 in comparison to other GPU methods.  相似文献   

12.
The block distributed memory model   总被引:1,自引:0,他引:1  
We introduce a computation model for developing and analyzing parallel algorithms on distributed memory machines. The model allows the design of algorithms using a single address space and does not assume any particular interconnection topology. We capture performance by incorporating a cost measure for interprocessor communication induced by remote memory accesses. The cost measure includes parameters reflecting memory latency, communication bandwidth, and spatial locality. Our model allows the initial placement of the input data and pipelined prefetching. We use our model to develop parallel algorithms for various data rearrangement problems, load balancing, sorting, FFT, and matrix multiplication. We show that most of these algorithms achieve optimal or near optimal communication complexity while simultaneously guaranteeing an optimal speed-up in computational complexity. Ongoing experimental work in testing and evaluating these algorithms has thus far shown very promising results  相似文献   

13.
In this paper we consider the problem of finding perfect matchings in parallel. We present a RNC algorithm with almost optimal work with respect to sequential algorithms, i.e., it uses O(n ω ) processors, where ω is the matrix multiplication exponent. Our algorithm is based on an RNC algorithm for computing determinant of a degree one polynomial matrix which is of independent interest. Research supported by KBN grant 1P03A01830.  相似文献   

14.
Max Restricted Path Consistency (maxRPC) is a local consistency for binary constraints that enforces a higher order of consistency than arc consistency. Despite the strong pruning that can be achieved, maxRPC is rarely used because existing maxRPC algorithms suffer from overheads and redundancies as they can repeatedly perform many constraint checks without triggering any value deletions. In this paper we propose and evaluate techniques that can boost the performance of maxRPC algorithms by eliminating many of these overheads and redundancies. These include the combined use of two data structures to avoid many redundant constraint checks, and the exploitation of residues to quickly verify the existence of supports. Based on these, we propose a number of closely related maxRPC algorithms. The first one, maxRPC3, has optimal O(end 3) time complexity, displays good performance when used stand-alone, but is expensive to apply during search. The second one, maxRPC3 rm , has O(en 2 d 4) time complexity, but a restricted version with O(end 4) complexity can be very efficient when used during search. The other algorithms are simple modifications of maxRPC3 rm . All algorithms have O(ed) space complexity when used stand-alone. However, maxRPC3 has O(end) space complexity when used during search, while the others retain the O(ed) complexity. Experimental results demonstrate that the resulting methods constantly outperform previous algorithms for maxRPC, often by large margins, and constitute a viable alternative to arc consistency on some problem classes.  相似文献   

15.
A new fast matrix multiplication algorithm is proposed, which, as compared to the Winograd algorithm, has a lower multiplicative complexity equal to W M 0.437n3 multiplication operations. Based on a goal-directed transformation of its basic graph, new optimized architectures of systolic arrays are synthesized. A systolic variant of the Strassen algorithm is presented for the first time.  相似文献   

16.
Working in the framework of PAC-learning theory, we present special statistics for accomplishing in polynomial time proper learning of DNF boolean formulas having a fixed number of monomials. Our statistics turn out to be near sufficient for a large family of distribution laws – that we call butterfly distributions. We develop a theory of most powerful learning for analyzing the performance of learning algorithms, with particular reference to trade-offs between power and computational costs. Focusing attention on sample and time complexity, we prove that our algorithm works as efficiently as the best algorithms existing in the literature – while the latter only take care of subclasses of our family of distributions.  相似文献   

17.
The increasing demand for higher resolution images and higher frame rate videos will always pose a challenge to computational power when real-time performance is required to solve the stereo-matching problem in 3D reconstruction applications. Therefore, the use of asymptotic analysis is necessary to measure the time and space performance of stereo-matching algorithms regardless of the size of the input and of the computational power available. In this paper, we survey several classic stereo-matching algorithms with regard to time–space complexity. We also report running time experiments for several algorithms that are consistent with our complexity analysis. We present a new dense stereo-matching algorithm based on a greedy heuristic path computation in disparity space. A procedure which improves disparity maps in depth discontinuity regions is introduced. This procedure works as a post-processing step for any technique that solves the dense stereo-matching problem. We prove that our algorithm and post-processing procedure have optimal O(n) time–space complexity, where n is the size of a stereo image. Our algorithm performs only a constant number of computations per pixel since it avoids a brute force search over the disparity range. Hence, our algorithm is faster than “real-time” techniques while producing comparable results when evaluated with ground-truth benchmarks. The correctness of our algorithm is demonstrated with experiments in real and synthetic data.  相似文献   

18.
In this paper we describe a technique for finding efficient parallel algorithms for problems on directed graphs that involve checking the existence of certain kinds of paths in the graph. This technique provides efficient algorithms for finding dominators in flow graphs, performing interval and loop analysis on reducible flow graphs, and finding the feedback vertices of a digraph. Each of these algorithms takesO(log2 n) time using the same number of processors needed for fast matrix multiplication. All of these bounds are for an EREW PRAM.  相似文献   

19.
We propose two fast algorithms for abrupt change detection in streaming data that can operate on arbitrary unknown data distributions before and after the change. The first algorithm, MB-GT\textsf{MB-GT} , computes efficiently the average Euclidean distance between all pairs of data points before and after the hypothesized change. The second algorithm, MB-CUSUM\textsf{MB-CUSUM} , computes the log-likelihood ratio statistic for the data distributions before and after the change, similarly to the classical CUSUM algorithm, but unlike that algorithm, MB-CUSUM\textsf{MB-CUSUM} does not need to know the exact distributions, and uses kernel density estimates instead. Although a straightforward computation of the two change statistics would have computational complexity of O(N 4) with respect to the size N of the streaming data buffer, the proposed algorithms are able to use the computational structure of these statistics to achieve a computational complexity of only O(N 2) and memory requirement of O(N). Furthermore, the algorithms perform surprisingly well on dependent observations generated by underlying dynamical systems, unlike traditional change detection algorithms.  相似文献   

20.
This paper presents optimal supervisory control of dynamical systems that can be represented by deterministic finite state automaton (DFSA) models. The performance index for the optimal policy is obtained by combining a measure of the supervised plant language with (possible) penalty on disabling of controllable events. The signed real measure quantifies the behaviour of controlled sublanguages based on a state transition cost matrix and a characteristic vector as reported in earlier publications. Synthesis of the optimal control policy requires at most n iterations, where n is the number of states of the DFSA model generated from the unsupervised plant language. The computational complexity of the optimal control synthesis is polynomial in n. Syntheses of the control algorithms are illustrated with two application examples.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号