期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Efficient algorithms for parallel sorting on mesh multicomputers

V. Singh V. Kumar G. Agha C. Tomlinson 《International journal of parallel programming》1991,20(2):95-131

We present two new parallel algorithms QSP1 and QSP2 based on sequential quicksort for sorting data on a mesh multicomputer, and analyze their scalability using the isoefficiency metric. We show that QSP2 matches the lower bound on the isoefficiency function for mesh multicomputers, while QSP1 is fairly close to optimal. Langet al. ⁽¹⁾ and Schnorret al. ⁽²⁾ have developed parallel sorting algorithms for the mesh architecture that have either optimal (Schnorr) or close to optimal (Lang) run-time complexity for the one-element-perprocessor case. Both QSP1 and QSP2 have better scalability than the scaled-down variants of these algorithms (for the case in which there are more elements than processors). We also analyze a different variant of Lang's sort which is as scalable as QSP2. We briefly discuss another metric called resource consumption. According to this metric, both QSP1 and QSP2 are superior to variants of Lang's sort. 相似文献

2.

Efficient broadcasting in wormhole-routed multicomputers: anetwork-partitioning approach

Yu-Chee Tseng San-Yuan Wang Chin-Wen Ho 《Parallel and Distributed Systems, IEEE Transactions on》1999,10(1):44-61

In this paper, a network-partitioning approach for one-to-all broadcasting on wormhole-routed networks is proposed. To broadcast a message, the scheme works in three phases. First, a number of data-distributing networks (DDNs), which can work independently, are constructed. Then the message is evenly divided into submessages, each being sent to a representative node in one DDN. Second, the submessages are broadcast on the DDNs concurrently. Finally, a number of data-collecting networks (DCNs), which can work independently too, are constructed. Then, concurrently on each DCN, the submessages are collected and combined into the original message. Our approach, especially designed for wormhole-routed networks, is conceptually similar but fundamentally very different from the traditional approach of using multiple edge-disjoint spanning trees in parallel for broadcasting in store-and-forward networks. One interesting issue is on the definition of independent DDNs and DCNs, in the sense of wormhole routing. We show how to apply this approach to tori, meshes, and hypercubes. Thorough analyses and comparisons based on different system parameters and configurations are conducted. The results do confirm the advantage of our scheme, under various system parameters and conditions, over other existing broadcasting algorithms 相似文献

3.

Algorithms for distributed termination detection

Friedemann Mattern 《Distributed Computing》1987,2(3):161-175

The termination problem for distributed computations is analyzed in the general context of asynchronous communication. In the underlying computational model it is assumed that messages take an arbitrary but finite time and do not necessarily obey the FIFO rule. Time diagrams are used as a graphic means of representing the overall communication scheme, giving a clear insight into the difficulties involved (e.g., lack of global state or time, inconsistent time cuts) and suggesting possible solutions.Several efficient algorithms for the solution of the termination problem are presented. They are all based on the idea of message counting but have a number of different characteristics. The methods are discussed and compared with other known solutions.Friedemann Matternreceived the Diploma in computer science from the University of Bonn, West Germany, in 1983.He is now a research scientist in the Department of Computer Science at the University of Kaiserslautern and is currently completing his Ph.D. His primary research interests include distributed algorithms, programming language design, and compiler construction. The author can be reached by electronic mail via mattern @ incas.uucp or mattern % uklirb.uucp @ Germany.csne.This work has been supported by the Deutsche Forschungsgemeinschaft (DFG) as part of the SFB124 research project VLSI-Design and Parallelism 相似文献

4.

Message-routing systems for transputer-based multicomputers

Talia D. 《Micro, IEEE》1993,13(3):62-72

The Tiny, CSN, Multiple Rings, and Ordered Dimensions, and interval labeling routing systems for transputer networks are reviewed. The systems are compared with respect to several criteria, such as adaptivity, deadlock freedom, generality, livelock freedom, and network latency 相似文献

5.

Efficient data parallel algorithms for multidimensional array operations based on the EKMR scheme for distributed memory multicomputers

Chun-Yuan Lin Yeh-Ching Chung Jen-Shiuh Liu 《Parallel and Distributed Systems, IEEE Transactions on》2003,14(7):625-639

Array operations are useful in a large number of important scientific codes, such as molecular dynamics, finite element methods, climate modeling, atmosphere and ocean sciences, etc. In our previous work, we have proposed a scheme of extended Karnaugh map representation (EKMR) for multidimensional array representation. We have shown that sequential multidimensional array operation algorithms based on the EKMR scheme have better performance than those based on the traditional matrix representation (TMR) scheme. Since parallel multidimensional array operations have been an extensively investigated problem, we present efficient data parallel algorithms for multidimensional array operations based on the EKMR scheme for distributed memory multicomputers. In a data parallel programming paradigm, in general, we distribute array elements to processors based on various distribution schemes, do local computation in each processor, and collect computation results from each processor. Based on the row, column, and 2D mesh distribution schemes, we design data parallel algorithms for matrix-matrix addition and matrix-matrix multiplication array operations in both TMR and EKMR schemes for multidimensional arrays. We also design data parallel algorithms for six Fortran 90 array intrinsic functions: All, Maxval, Merge, Pack, Sum, and Cshift. We compare the time of the data distribution, the local computation, and the result collection phases of these array operations based on the TMR and the EKMR schemes. The experimental results show that algorithms based on the EKMR scheme outperform those based on the TMR scheme for all test cases. 相似文献

6.

The Paradigm compiler for distributed-memory multicomputers 总被引：1，自引：0，他引：1

Banerjee P. Chandy J.A. Gupta M. Hodges E.W. IV. Holm J.G. Lain A. Palermo D.J. Ramaswamy S. Su E. 《Computer》1995,28(10):37-47

To harness the computational power of massively parallel distributed-memory multicomputers, users must write efficient software. This process is laborious because of the absence of global address space. The programmer must manually distribute computations and data across processors and explicitly manage communication. The Paradigm (PARAllelizing compiler for DIstributed-memory, General-purpose Multicomputers) project at the University of Illinois addresses this problem by developing automatic methods for the efficient parallelization of sequential programs. A unified approach efficiently supports regular and irregular computations using data and functional parallelism 相似文献

7.

Efficient two-stage early SKIP mode termination for depth video coding

Huanqiang Zeng Yongtao Wang Zhe Wei Canhui Cai 《Computers & Electrical Engineering》2014

The video plus depth format has been widely-used in multi-view video systems. Therefore, low complexity depth video coding becomes very essential. For that, an efficient two-stage early SKIP mode termination (TESMT) algorithm is proposed in this paper. First, by using texture-depth correlation, our approach first checks whether current macroblock (MB) is motionless or slow-motion based on motion activity of corresponding region in texture video. If so, SKIP mode is selected as optimal mode and mode decision process is early terminated. Otherwise, our approach further checks whether the rate-distortion cost of SKIP mode is below an adaptive threshold, which is derived by exploiting spatial-temporal correlation between current MB and its adjacent MBs in depth video. Experimental results show that proposed algorithm significantly reduces computational complexity while keeping almost the same coding efficiency of depth video and quality of synthesized view, compared with exhaustive mode decision in multi-view video coding. 相似文献

8.

A programming methodology for dual-tier multicomputers

Baden S.B. Fink S.J. 《IEEE transactions on pattern analysis and machine intelligence》2000,26(3):212-226

Hierarchically organized ensembles of shared memory multiprocessors possess a richer and more complex model of locality than previous generation multicomputers with single processor nodes. These dual-tier computers introduce many new factors into the programmer's performance model. We present a methodology for implementing block-structured numerical applications on dual-tier computers and a run-time infrastructure, called KeLP2, that implements the methodology. KeLP2 supports two levels of locality and parallelism via hierarchical SPMD control flow, run-time geometric meta-data, and asynchronous collective communication. KeLP applications can effectively overlap communication with computation under conditions where nonblocking point-to-point message passing fails to do so. KeLP's abstractions hide considerable detail without sacrificing performance and dual-tier applications written in KeLP consistently outperform equivalent single-tier implementations written in MPI. We describe the KeLP2 model and show how it facilitates the implementation of five block-structured applications specially formulated to hide communication latency on dual-tiered architectures. We support our arguments with empirical data from applications running on various single- and dual-tier multicomputers. KeLP2 supports a migration path from single-tier to dual-tier platforms and we illustrate this capability with a detailed programming example 相似文献

9.

A visualization programming environment for multicomputers 总被引：1，自引：0，他引：1

Bishop G. Monger M. Ramsey P. 《Computer Graphics and Applications, IEEE》1990,10(4):50-58

The programming and run-time environment used for the authors' multicomputer visualization software are described. The particular approach to using multicomputers for scientific visualization provides a uniform interface to system and communications facilities and promotes modularity and code reuse. No breakthrough technology is involved; rather, a collection of methods that have been developed by others has been optimized for visual computing applications and unified into a system that is simple to use and easy to port to new hardware. The C language is used. Initial experience with the system has been good 相似文献

10.

Concurrency in Ada and multicomputers

N.H. Gehani 《Computer Languages, Systems and Structures》1982,7(1):21-23

Implementation of Ada's parallel tasks on a multicomputer architecture requires additional communication and naming overhead because tasks can operate on shared data via global variables and pointers. This increases the complexity of implementing Ada and has a negative impact on program understandability. 相似文献