共查询到20条相似文献,搜索用时 9 毫秒
1.
The computational speed of individual processors in distributed memory computers is increasing faster than the communication speed of the interconnection networks. This has led to the general perception among developers of compilers for data-parallel languages that overlapping communications with computations is an important optimization. We demonstrate that communication–computation overlap has limited utility. Overlapping communications with computations can never more than double the speed of a parallel application, and in practice the relative improvement in speed is usually far less than that. Most parallel algorithms have computational requirements that grow faster than their communication requirements. When this is the case, the gain from communication–computation overlap asymptotically approaches zero as the problem size increases. 相似文献
2.
《Journal of Parallel and Distributed Computing》1993,18(2):129-146
Observing the activities of a complex parallel computer system is no small feat, and relating these observations to program behavior is even harder. In this paper, we present a general measurement approach that is applicable to a large class of scalable programs and machines, specifically SPMD and data-parallel programs executing on distributed memory computer systems. The combined instrumentation and visualization paradigm, called VISTA, is based on our experiences in programming and monitoring applications running on an nCUBE 2 computer and a MasPar MP-1 computer. The key is that performance data are treated similarly to any distributed data in the context of the programming models and presented via a hierarchy of multiple views. Because of the data-parallel mapping of program onto machine, we can view the performance as it relates to each processor, processor cluster, or the processor ensemble and as it relates to the data structures of the program. We illustrate the utility of VISTA by example. 相似文献
3.
4.
V. V. Khilenko 《Cybernetics and Systems Analysis》2001,37(4):596-599
The mathematical apparatus of decomposition is used to solve the problem of analysis and computation of stiff stochastic systems of differential equations. A theorem substantiating the adequacy of a solution obtained is formulated and an algorithm of computation of stiff stochastic systems by the method of depression of equations is given. 相似文献
5.
《Journal of Parallel and Distributed Computing》1995,26(1):72-84
Generating local addresses and communication sets is an important issue in distributed-memory implementations of data-parallel languages such as High Performance Fortran. We demonstrate a storage scheme for an array A affinely aligned to a template that is distributed across p processors with a cyclic(k) distribution that does not waste any storage, and show that, under this storage scheme, the local memory access sequence of any processor for a computation involving the regular section A(ℓ:h:s) is characterized by a finite state machine of at most k states. We present fast algorithms for computing the essential information about these state machines, and we extend the framework to handle multidimensional arrays. We also show how to generate communication sets using the state machine approach. Performance results show that this solution requires very little runtime overhead and acceptable preprocessing time. 相似文献
6.
7.
《Journal of Parallel and Distributed Computing》2000,60(2):189-216
This paper presents compilation techniques used to compress holes, which are caused by the nonunit alignment stride in a two-level data-processor mapping. Holes are the memory locations mapped by useless template cells. To fully utilize the memory space, memory holes should be removed. In a two-level data-processor mapping, there is a repetitive pattern for array elements mapped onto processors. We classify blocks into classes and use a class table to record the distribution of each class in the first repetitive data distribution pattern. Similarly, data distribution on a processor also has a repetitive pattern. We use a compression table to record the distribution of each block in the first repetitive data distribution pattern on a processor. By using a class table and a compression table, hole compression can be easily and efficiently achieved. Compressing holes can save memory usage, improve spatial locality and further improve system performance. The proposed method is efficient, stable, and easy to implement. The experimental results do confirm the advantages of our proposed method over existing methods. 相似文献
8.
1.引言 并行算法在并行体系结构上的可伸缩性分析(Scalability analysis)是目前巨量并行理MPP研究的中心问题之一。可伸缩性作为巨量并行机上并行算法的主要性能指标,揭示了在性能计算机 相似文献
9.
《Journal of Parallel and Distributed Computing》1994,22(3):379-391
The scalability of a parallel algorithm on a parallel architecture is a measure of its capacity to effectively utilize an increasing number of processors. Scalability analysis may be used to select the best algorithm-architecture combination for a problem under different constraints on the growth of the problem size and the number of processors. It may be used to predict the performance of a parallel algorithm and a parallel architecture for a large number of processors from the known performance on fewer processors. For a fixed problem size, it may be used to determine the optimal number of processors to be used and the maximum possible speedup that can be obtained. The objectives of this paper are to critically assess the state of the art in the theory of scalability analysis, and to motivate further research on the development of new and more comprehensive analytical tools to study the scalability of parallel algorithms and architectures. We survey a number of techniques and formalisms that have been developed for studying scalability issues, and discuss their interrelationships. For example, we derive an important relationship between time-constrained scaling and the isoefficiency function. We point out some of the weaknesses of the existing schemes for measuring scalability, and discuss possible ways of extending them. 相似文献
10.
《Journal of Parallel and Distributed Computing》1994,21(1):124-139
Data-parallel implementations of the computationally intensive task of solving multiple quadratic forms (MQFs) have been examined. Coupled and uncoupled parallel methods are investigated, where coupling relates to the degree of interaction among the processors. Also, the impact of partitioning a large MQF problem into smaller non-interacting subtasks is studied. Trade-offs among the implementations for various data-size/machine-size ratios are categorized in terms of complex arithmetic operation counts, communication overhead, and memory storage requirements. Furthermore, the impact on performance of the mode of parallelism used is considered, specifically, SIMD versus MIMD versus SIMD/MIMD mixed-mode. From the complexity analyses, it is shown that none of the algorithms presented in this paper is best for all data-size/machine-size ratios. Thus, to achieve scalability (i.e., good performance as the number of processors available in a machine increases), instead of using a single algorithm, the approach discussed is to have a set of algorithms from which the most appropriate algorithm or combination of algorithms is selected based on the ratio calculated from the scaled machine size. The analytical results have been verified by experiments on the MasPar MP-1 (SIMD), nCUBE 2 (MIMD), and PASM (mixed-mode) prototype. 相似文献
11.
12.
A display-level system using specialized processors offers a faster method of modeling visually complex objects. 相似文献
13.
Jií Barnat Jakub Chaloupka Jaco van de Pol 《Electronic Notes in Theoretical Computer Science》2008,198(1):201
We study and improve the OBF technique [Barnat, J. and P.Moravec, Parallel algorithms for finding SCCs in implicitly given graphs, in: Proceedings of the 5th International Workshop on Parallel and Distributed Methods in Verification (PDMC 2006), LNCS (2007)], which was used in distributed algorithms for the decomposition of a partitioned graph into its strongly connected components. In particular, we introduce a recursive variant of OBF and experimentally evaluate several different implementations of it that vary in the degree of parallelism. For the evaluation we used synthetic graphs with a few large components and graphs with many small components. We also experimented with graphs that arise as state spaces in real model checking applications. The experimental results are compared with that of other successful SCC decomposition techniques [Orzan, S., “On Distributed Verification and Verified Distribution,” Ph.D. thesis, Free University of Amsterdam (2004); Fleischer, L.K., B. Hendrickson and A. Pinar, On identifying strongly connected components in parallel, in: Parallel and Distributed Processing, IPDPS Workshops, Lecture Notes in Computer Science 1800, 2000, pp. 505–511]. 相似文献
14.
15.
Fayez AlFayez 《计算机系统科学与工程》2023,44(3):2165-2176
The paper addresses the challenge of transmitting a big number of files stored in a data center (DC), encrypting them by compilers, and sending them through a network at an acceptable time. Face to the big number of files, only one compiler may not be sufficient to encrypt data in an acceptable time. In this paper, we consider the problem of several compilers and the objective is to find an algorithm that can give an efficient schedule for the given files to be compiled by the compilers. The main objective of the work is to minimize the gap in the total size of assigned files between compilers. This minimization ensures the fair distribution of files to different compilers. This problem is considered to be a very hard problem. This paper presents two research axes. The first axis is related to architecture. We propose a novel pre-compiler architecture in this context. The second axis is algorithmic development. We develop six algorithms to solve the problem, in this context. These algorithms are based on the dispatching rules method, decomposition method, and an iterative approach. These algorithms give approximate solutions for the studied problem. An experimental result is implemented to show the performance of algorithms. Several indicators are used to measure the performance of the proposed algorithms. In addition, five classes are proposed to test the algorithms with a total of 2350 instances. A comparison between the proposed algorithms is presented in different tables discussed to show the performance of each algorithm. The result showed that the best algorithm is the Iterative-mixed Smallest-Longest- Heuristic (ISL) with a percentage equal to 97.7% and an average running time equal to 0.148 s. All other algorithms did not exceed 22% as a percentage. The best algorithm excluding ISL is Iterative-mixed Longest-Smallest Heuristic (ILS) with a percentage equal to 21,4% and an average running time equal to 0.150 s. 相似文献
16.
Rafa Somla 《Electronic Notes in Theoretical Computer Science》2005,119(1):51
We present new algorithms for determining optimal strategies for two-player games with proba- bilistic moves and reachability winning conditions. Such games, known as simple stochastic games, were extensively studied by A.Condon [Anne Condon. The complexity of stochastic games. Information and Computation, 96(2):203–224, 1992, Anne Condon. On algorithms for simple stochastic games. In Jin-Yi Cai, editor, Advances in Computational Complexity Theory, volume 13 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pages 51–73. AMS, 1993]. Many interesting problems, including parity games and hence also mu-calculus model checking, can be reduced to simple stochastic games. It is an open problem, whether simple stochastic games can be solved in polynomial time. Our algorithms determine the optimal expected payoffs in the game. We use geometric interpre- tation of the search space as a subset of the hyper-cube [0,1]N. The main idea is to divide this set into convex subregions in which linear optimization methods can be used. We show how one can proceed from one subregion to the other so that, eventually, a region containing the optinal payoffs will be found. The total number of subregions is exponential in the size of the game but, in practice, the algorithms need to visit only few of them to find a solution. We believe that our new algorithms could provide new insights into the difficult problem of deter- mining algorithmic complexity of simple stochastic games and other, equivallent problems. 相似文献
17.
This paper studies the convergence properties of a general class of decomposition algorithms for support vector machines (SVMs). We provide a model algorithm for decomposition, and prove necessary and sufficient conditions for stepwise improvement of this algorithm. We introduce a simple rate certifying condition and prove a polynomial-time bound on the rate of convergence of the model algorithm when it satisfies this condition. Although it is not clear that existing SVM algorithms satisfy this condition, we provide a version of the model algorithm that does. For this algorithm we show that when the slack multiplier C satisfies 1/2 C mL, where m is the number of samples and L is a matrix norm, then it takes no more than 4LC
2
m
4/ iterations to drive the criterion to within of its optimum. 相似文献
18.
Farid Ablayev Alexander Vasiliev 《International Journal of Software and Informatics》2013,7(4):485-500
In the paper, we develop a method for constructing quantum algorithms for computing Boolean functions by quantum ordered read-once branching programs (quantum OBDDs). Our method is based on ˉngerprinting technique and representation of Boolean functions by their characteristic polynomials. We use circuit notation for branching programs for desired algorithms presentation. For several known functions our approach provides optimal QOBDDs. Namely we consider such functions as MODm, EQn, Palindromen, and PERMn (testing whether given Boolean matrix is the Permutation Matrix). We also propose a generalization of our method and apply it to the Boolean variant of the Hidden
Subgroup Problem. 相似文献
19.
以LEACH为基础演化而来的各类算法在簇头选举时始终包含有“随机选择”的成分,导致无线传感器网络在拓扑结构的优化和能量消耗的均衡上受到限制。从分化簇头功能和优化功能节点选举机制的角度出发,提出一种分化簇头功能的分布式算法,引入功能节点推荐机制,弱化簇头选举中的随机成分,分化簇头功能,将以往簇头管理节点、融合数据、转发信息的三大功能分别由管理节点、融合节点、转发节点3个功能节点来承担。仿真数据表明,提出的分簇算法能有效优化簇内拓扑结构、提高节点能量消耗均衡性,能够延长网络生存周期15%~20%。 相似文献
20.
O. N. Granichin 《Automation and Remote Control》2002,63(2):209-219
New algorithms for stochastic approximation under input disturbance are designed. For the multidimensional case, they are simple in form, generate consistent estimates for unknown parameters under almost arbitrary disturbances, and are easily incorporated in the design of quantum devices for estimating the gradient vector of a function of several variables. 相似文献