共查询到20条相似文献,搜索用时 15 毫秒
1.
《Journal of Parallel and Distributed Computing》1995,31(1):25-40
Parallelizing sparse Simplex algorithms is one of the most challenging problems in computational science. We implemented the revised Simplex algorithm with LU decomposition on the Touchstone Delta and the iPSC/2. Because of very sparse matrices and very heavy communication, the ratio of computation to communication is extremely low. It becomes necessary to carefully select parallel algorithms, partitioning patterns, and communication optimization to achieve a reasonable speedup. Satisfactory performance has been obtained for a class of LP problems with high n/m ratios. 相似文献
2.
T. Fahringer K. Sowa‐Pieko P. Czerwiski P. Brezany M. Bubak R. Koppler R. Wismüller 《Concurrency and Computation》2002,14(2):103-136
Debuggers play an important role in developing parallel applications. They are used to control the state of many processes, to present distributed information in a concise and clear way, to observe the execution behavior, and to detect and locate programming errors. More sophisticated debugging systems also try to improve understanding of global execution behavior and intricate details of a program. In this paper we describe the design and implementation of SPiDER, which is an interactive source‐level debugging system for both regular and irregular High‐Performance Fortran (HPF) programs. SPiDER combines a base debugging system for message‐passing programs with a high‐level debugger that interfaces with an HPF compiler. SPiDER, in addition to conventional debugging functionality, allows a single process of a parallel program to be expected or the entire program to be examined from a global point of view. A sophisticated visualization system has been developed and included in SPiDER to visualize data distributions, data‐to‐processor mapping relationships, and array values. SPiDER enables a programmer to dynamically change data distributions as well as array values. For arrays whose distribution can change during program execution, an animated replay displays the distribution sequence together with the associated source code location. Array values can be stored at individual execution points and compared against each other to examine execution behavior (e.g. convergence behavior of a numerical algorithm). Finally, SPiDER also offers limited support to evaluate the performance of parallel programs through a graphical load diagram. SPiDER has been fully implemented and is currently being used for the development of various real‐world applications. Several experiments are presented that demonstrate the usefulness of SPiDER. Copyright © 2002 John Wiley & Sons, Ltd. 相似文献
3.
4.
《Journal of Parallel and Distributed Computing》2001,61(4):467-500
An increasing number of programming languages, such as Fortran 90, HPF, and APL, provide a rich set of intrinsic array functions and array expressions. These constructs, which constitute an important part of data parallel languages, provide excellent opportunities for compiler optimizations. The synthesis of consecutive array operations or array expressions into a composite access function of the source arrays at compile time has been shown (A. T. Budd, ACM Trans. Programm. Lang. Syst.6 (July 1984), 297–313; G. H. Hwang et al., in “Proc. of ACM SIGPLAN Conference on Principles and Practice of Parallel Programming, 1995,” pp. 112–122) to be an effective scheme for optimizing programs on flat shared memory parallel architectures. It remains, however, to be studied how the synthesis scheme can be incorporated into optimizing HPF-like programs on distributed memory machines by taking into account communication costs. In this paper, we propose solutions to address this issue. We first show how to perform array operation synthesis on HPF programs, and we demonstrate its performance benefits on distributed memory machines with real applications. In addition, to prevent a situation we call “synthesis anomaly,” we present an optimal solution to guide the array synthesis process on distributed memory machines. Due to the optimal problem being NP-hard, we further develop a practical strategy that compilers can use on distributed memory machines with HPF programs. Our synthesis engine is implemented as a Web-based tool, called Syntool, and experimental results show significant performance improvement over the base codes for HPF code fragments from real appli- cations on parallel machines. Our experiments were performed on three distributed memory machines: an 8-node DEC Alpha Farm, a 16-node IBM SP-2, and a 16-node nCUBE/2. 相似文献
5.
High Performance Fortran (HPF) is a data-parallel language that provides a high-level interface for programming scientific applications, while delegating to the compiler the task of generating explicitly parallel message-passing programs. This paper provides an overview of HPF compilation and runtime technology for distributed-memory architectures, and deals with a number of topics in some detail. In particular, we discuss distribution and alignment processing, the basic compilation scheme and methods for the optimization of regular computations. A separate section is devoted to the transformation and optimization of independent loops with irregular data accesses. The paper concludes with a discussion of research issues and outlines potential future development paths of the language. 相似文献
6.
Efficient programming of task-parallel problems, where the number and execution times of the computational tasks can vary unpredictably, demands an asynchronous and adaptive approach. In this sort of approach, however, such fundamental programming issues as load sharing, data sharing, and termination detection can present difficult programming problems. This paper presents the PMESC library for managing task-parallel problems on distributed-memory MIMD computers within the context of the SPMD (single program, multiple data) programming model. PMESC offers support for all of the application-independent programming issues involved in SPMD task-parallel computation in a portable and efficient way while still allowing users to customize their codes. Because different problems may require different strategies to achieve good performance, PMESC is based on a straightforward model in which different building blocks can be easily put together and changed to accommodate the particular needs of the different applications. The library provides an interface that allows users to program a virtual machine and thereby ignore the details associated with message passing and machine architecture. These features make PMESC accessible to a wide variety of users. 相似文献
7.
阐述了在matlab环境下,调用Fortran语言的原理,并通过一实例说明如何实现Matlab,Fortran两种语言的混合编程. 相似文献
8.
In this paper, we give an overview of the results of the CRAFT optimising compiler project (Fortran 90/HPF subset compilers). We start by describing the theoretical framework within which we designed program transformations for the optimization of inter- and intra-procedural data motion, as well as the optimizations for parallel loops; we then describe the implementation of the CRAFT compilers for Thinking Machines' CM-2 and CM-5. We report results from experiments on the Connection Machine CM-5, the IBM SP-2 and a network of UltraSparc workstations. The results demonstrate that these optimizations can achieve significant object code performance improvement. Copyright © 1999 John Wiley & Sons, Ltd. 相似文献
9.
Keqin Li 《Journal of Parallel and Distributed Computing》2001,61(12):1709
Consider any known sequential algorithm for matrix multiplication over an arbitrary ring with time complexity O(Nα), where 2<α3. We show that such an algorithm can be parallelized on a distributed memory parallel computer (DMPC) in O(log N) time by using Nα/log N processors. Such a parallel computation is cost optimal and matches the performance of PRAM. Furthermore, our parallelization on a DMPC can be made fully scalable, that is, for all 1pNα/log N, multiplying two N×N matrices can be performed by a DMPC with p processors in O(Nα/p) time, i.e., linear speedup and cost optimality can be achieved in the range [1..Nα/log N]. This unifies all known algorithms for matrix multiplication on DMPC, standard or non- standard, sequential or parallel. Extensions of our methods and results to other parallel systems are also presented. For instance, for all 1p Nα /log N, multiplying two N×N matrices can be performed by p processors connected by a hypercubic network in O(Nα/p+(N2/p2/α)(log p)2(α−1)/α) time, which implies that if p=O(Nα/(log N)2(α−1)/(α−2)), linear speedup can be achieved. Such a parallelization is highly scalable. The above claims result in significant progress in scalable parallel matrix multiplication (as well as solving many other important problems) on distributed memory systems, both theoretically and practically. 相似文献
10.
Skewed Data Partition and Alignment Techniques for Compiling Programs on Distributed Memory Multicomputers 总被引:3,自引:0,他引:3
Minimizing data communication over processors is the key to compile programs for distributed memory multicomputers. In this paper, we propose new data partition and alignment techniques for partitioning and aligning data arrays with a program in a way of minimizing communication over processors. We use skewed alignment instead of the dimension-ordered alignment techniques to align data arrays. By developing the skewed scheme, we can solve more complex programs with minimized data communication than that of the dimension-ordered scheme. Finally, we compare the proposed scheme with the dimension-ordered alignment one by experimental results. The experimental results show that our proposed scheme has more opportunities to align data arrays such that data communications over processors can be minimized. 相似文献
11.
S.D. Kaushik C.-H. Huang P. Sadayappan 《Journal of Parallel and Distributed Computing》1996,38(2):237
In languages such as High Performance Fortran (HPF), array statements are used to express data parallelism. In compiling array statements for distributed-memory machines, efficient enumeration of local index sets and commmunication sets is important. A method based on a virtual processor approach has been proposed for efficient index set enumeration for array statements involving arrays distributed using block-cyclic distributions. The virtual processor approach is based on viewing a block-cyclic distribution as a block (or cyclic) distribution on a set of virtual processors, which are cyclically (or block-wise) mapped to the physical processors. The key idea of the method is to first develop closed forms in terms of simple regular sections for the index sets for arrays distributed using block or cyclic distributions. These closed forms are then used with the virtual processor approach to give an efficient solution for arrays with the block-cyclic distribution. HPF supports a two-level mapping of arrays to processors. Arrays are first aligned with a template at an offset and a stride and the template is then distributed among the processors using a regular data distribution. The introduction of a nonunit stride in the alignment creates “holes” in the distributed arrays which leads to memory wastage. In this paper, using simple mathematical properties of regular sections, we extend the virtual processor approach to address the memory allocation and index set enumeration problems for array statements involving arrays mapped using the two-level mapping. We develop a methodology for translating the closed forms for block and cyclically distributed arrays mapped using a one-level mapping to closed forms for arrays mapped using the two-level mapping. Using these closed forms, the virtual processor approach is extended to handle array statements involving arrays mapped using two-level mappings. Performance results on the Cray T3D are presented to demonstrate the efficacy of the extensions and identify various trade-offs associated with the proposed method. 相似文献
12.
E. S. Borisov 《Cybernetics and Systems Analysis》2004,40(3):428-437
Parallel computers execute parallel programs that are transferred to other parallel architectures with difficultly and require special training of programmers. A parallelizing system is proposed that helps one to solve this problem. 相似文献
13.
The spectral analysis of geological and geophysical data has been a fundamental tool in understanding Earth's processes. We present a Fortran 90 library for multitaper spectrum estimation, a state-of-the-art method that has been shown to outperform the standard methods. The library goes beyond power spectrum estimation and extracts for the user more information including confidence intervals, diagnostics for single frequency periodicities, and coherence and transfer functions for multivariate problems. In addition, the sine multitaper method can also be implemented. The library presented here provides the tools needed in multiple fields of the Earth sciences for the analysis of data as evident from various examples. 相似文献
14.
Frietman Edward E. E. Ernst Ramon J. Crosbie Roy Shimoji Masao 《The Journal of supercomputing》1999,14(2):107-128
The antipodes of the class of sequential computers, executing tasks with a single CPU, are the parallel computers containing large numbers of computing nodes. In the shared-memory category, each node has direct access through a switching network to a memory bank, that can be composed of a single but large or multiple but medium sized memory configurations. Opposite to the first category are the distributed memory systems, where each node is given direct access to its own local memory section. Running a program in especially the latter category requires a mechanism that gives access to multiple address spaces, that is, one for each local memory. Transfer of data can only be done from one address space to another. Along with the two categories are the physically distributed, shared-memory systems, that allow the nodes to explore a single globally shared address space. All categories, the performances of which are subject to the way the computing nodes are linked, need either a direct or a switched interconnection network for inter-node communication purposes. Linking nodes and not taking into account the prerequisite of scalability in case of exploiting large numbers of them is not realistic, especially when the applied connection scheme must provide for fast and flexible communications at a reasonable cost. Different network topologies, varying from a single shared bus to a more complex elaboration of a fully connected scheme, and with them the corresponding intricate switching protocols have been extensively explored. A different vision is introduced concerning future prospects of an optically coupled distributed, shared-memory organized multiple-instruction, multiple-data system. In each cluster, an electrical crossbar looks after the interconnections between the nodes, the various memory modules and external I/O channels. The clusters itself are optically coupled through a free space oriented data distributing system. Analogies found in the design of the Convex SPP1000 substantiate the closeness to reality of such an architecture. Subsequently to the preceding introduction also an idealized picture of the fundamental properties of an optically based, fully connected, distributed, (virtual) shared-memory architecture is outlined. 相似文献
15.
Fortran 90 provides a rich set of array intrinsic functions. Each of these array intrinsic functions operates on the elements of multi-dimensional array objects concurrently. They provide a rich source of parallelism and play an increasingly important role in automatic support of data parallel programming. However, there is no such support if these intrinsic functions are applied to sparse data sets. In this paper, we address this open gap by presenting an efficient library for parallel sparse computations with Fortran 90 array intrinsic operations. Our method provides both compression schemes and distribution schemes on distributed memory environments applicable to higher-dimensional sparse arrays. This way, programmers need not worry about low-level system details when developing sparse applications. Sparse programs can be expressed concisely using array expressions, and parallelized with the help of our library. Our sparse libraries are built for array intrinsics of Fortran 90, and they include an extensive set of array operations such as CSHIFT, EOSHIFT, MATMUL, MERGE, PACK, SUM, RESHAPE, SPREAD, TRANSPOSE, UNPACK, and section moves. Our work, to our best knowledge, is the first work to give sparse and parallel sparse supports for array intrinsics of Fortran 90. In addition, we provide a complete complexity analysis for our sparse implementation. The complexity of our algorithms is in proportion to the number of nonzero elements in the arrays, and that is consistent with the conventional design criteria for sparse algorithms and data structures. Our current testbed is an IBM SP2 workstation cluster. Preliminary experimental results with numerical routines, numerical applications, and data-intensive applications related to OLAP (on-line analytical processing) show that our approach is promising in speeding up sparse matrix computations on both sequential and distributed memory environments if the programs are expressed with Fortran 90 array expressions. 相似文献
16.
Sandrine Blazy 《Automated Software Engineering》2000,7(4):345-376
Partial evaluation is an optimization technique traditionally used in compilation. We have adapted this technique to the understanding of scientific application programs during their maintenance. We have implemented a tool that analyzes Fortran 90 application programs and performs an interprocedural pointer analysis. This paper presents a dynamic semantics of Fortran 90 and manually derives a partial evaluator from this semantics. The tool implementing the specifications is also detailed. The partial evaluator has been implemented in a generic programming environment and a graphical interface has been developed to visualize the information computed during the partial evaluation (values of variables, already analyzed procedures, scope of variables, removed statements, etc.). 相似文献
17.
Paul R. Baumann 《Computers & Geosciences》1978,4(1):23-32
The construction of computer-generated isopleth maps has been limited generally to large computer programs such as SYMAP and AUTOMAP II. These programs may require machines with large amounts of memory. Large machines are not available to many potential users. This paper describes the program ISO, which is designed to create isopleth maps on small- and medium-size computers. ISO complements the CMAP program developed to produce chloropleth maps on small computers and employs many of the features used in CMAP. It also provides some additional features such as limited map cosmetic capability, internal storage of scan lines for multiple map runs, and greater operational convenience in establishing map categories, symbols, and text. Proximal maps also can be generated by ISO. 相似文献
18.
19.
Indranil Dasgupta Andrea Ruben Levi Vittorio Lubicz Claudio Rebbi 《Computer Physics Communications》1996,98(3):365-397
We present a complete set of Fortran 90 modules that can be used to write very compact, efficient, and high level QCD programs. The modules define fields (gauge, fermi, generators, complex, and real fields) as abstract data types, together with simpler objects such as SU (3) matrices or color vectors. Overloaded operators are then defined to perform all possible operations between the fields that may be required in a QCD simulation. QCD programs written using these modules need not have cumbersome subroutines and can be very simple and transparent. This is illustrated with two simple example programs. 相似文献
20.
《Advances in Engineering Software》1999,30(5):313-325
The boundary element method is an established numerical technique for the solution of the partial differential equations of potential theory and elasticity. Here we present an implementation of the method using the advanced features of Fortran 90. We show how the array, syntax, dynamic memory allocation and modularity allow the development of maintainable, readable and flexible boundary element codes. The ability to reuse large amounts of code independently of any particular integral equation is also demonstrated. Implementations for scalar and vector equations are presented, and the flexibility of the code is demonstrated by presenting multiple element types. The present implementation is illustrated by considering two numerical examples. 相似文献