首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 62 毫秒
1.
We present two new parallel algorithms QSP1 and QSP2 based on sequential quicksort for sorting data on a mesh multicomputer, and analyze their scalability using the isoefficiency metric. We show that QSP2 matches the lower bound on the isoefficiency function for mesh multicomputers, while QSP1 is fairly close to optimal. Langet al. (1) and Schnorret al. (2) have developed parallel sorting algorithms for the mesh architecture that have either optimal (Schnorr) or close to optimal (Lang) run-time complexity for the one-element-perprocessor case. Both QSP1 and QSP2 have better scalability than the scaled-down variants of these algorithms (for the case in which there are more elements than processors). We also analyze a different variant of Lang's sort which is as scalable as QSP2. We briefly discuss another metric called resource consumption. According to this metric, both QSP1 and QSP2 are superior to variants of Lang's sort.  相似文献   

2.
In this paper, a network-partitioning approach for one-to-all broadcasting on wormhole-routed networks is proposed. To broadcast a message, the scheme works in three phases. First, a number of data-distributing networks (DDNs), which can work independently, are constructed. Then the message is evenly divided into submessages, each being sent to a representative node in one DDN. Second, the submessages are broadcast on the DDNs concurrently. Finally, a number of data-collecting networks (DCNs), which can work independently too, are constructed. Then, concurrently on each DCN, the submessages are collected and combined into the original message. Our approach, especially designed for wormhole-routed networks, is conceptually similar but fundamentally very different from the traditional approach of using multiple edge-disjoint spanning trees in parallel for broadcasting in store-and-forward networks. One interesting issue is on the definition of independent DDNs and DCNs, in the sense of wormhole routing. We show how to apply this approach to tori, meshes, and hypercubes. Thorough analyses and comparisons based on different system parameters and configurations are conducted. The results do confirm the advantage of our scheme, under various system parameters and conditions, over other existing broadcasting algorithms  相似文献   

3.
The termination problem for distributed computations is analyzed in the general context of asynchronous communication. In the underlying computational model it is assumed that messages take an arbitrary but finite time and do not necessarily obey the FIFO rule. Time diagrams are used as a graphic means of representing the overall communication scheme, giving a clear insight into the difficulties involved (e.g., lack of global state or time, inconsistent time cuts) and suggesting possible solutions.Several efficient algorithms for the solution of the termination problem are presented. They are all based on the idea of message counting but have a number of different characteristics. The methods are discussed and compared with other known solutions.Friedemann Matternreceived the Diploma in computer science from the University of Bonn, West Germany, in 1983.He is now a research scientist in the Department of Computer Science at the University of Kaiserslautern and is currently completing his Ph.D. His primary research interests include distributed algorithms, programming language design, and compiler construction. The author can be reached by electronic mail via mattern @ incas.uucp or mattern % uklirb.uucp @ Germany.csne.This work has been supported by the Deutsche Forschungsgemeinschaft (DFG) as part of the SFB124 research project VLSI-Design and Parallelism  相似文献   

4.
A symmetric algorithm for detecting the termination of a distributed computation is presented. The algorithm does not require global information concerning the system and does not assume any communication features, barring finite delays in the delivery of messages. It permits dynamic creation and destruction of processes participating in the computation, and also permits destruction of a process by external processes, such as the OS kernel. It also provides for external processes spontaneously joining an ongoing computation. Proofs of safety and liveness are provided.  相似文献   

5.
Talia  D. 《Micro, IEEE》1993,13(3):62-72
The Tiny, CSN, Multiple Rings, and Ordered Dimensions, and interval labeling routing systems for transputer networks are reviewed. The systems are compared with respect to several criteria, such as adaptivity, deadlock freedom, generality, livelock freedom, and network latency  相似文献   

6.
Array operations are useful in a large number of important scientific codes, such as molecular dynamics, finite element methods, climate modeling, atmosphere and ocean sciences, etc. In our previous work, we have proposed a scheme of extended Karnaugh map representation (EKMR) for multidimensional array representation. We have shown that sequential multidimensional array operation algorithms based on the EKMR scheme have better performance than those based on the traditional matrix representation (TMR) scheme. Since parallel multidimensional array operations have been an extensively investigated problem, we present efficient data parallel algorithms for multidimensional array operations based on the EKMR scheme for distributed memory multicomputers. In a data parallel programming paradigm, in general, we distribute array elements to processors based on various distribution schemes, do local computation in each processor, and collect computation results from each processor. Based on the row, column, and 2D mesh distribution schemes, we design data parallel algorithms for matrix-matrix addition and matrix-matrix multiplication array operations in both TMR and EKMR schemes for multidimensional arrays. We also design data parallel algorithms for six Fortran 90 array intrinsic functions: All, Maxval, Merge, Pack, Sum, and Cshift. We compare the time of the data distribution, the local computation, and the result collection phases of these array operations based on the TMR and the EKMR schemes. The experimental results show that algorithms based on the EKMR scheme outperform those based on the TMR scheme for all test cases.  相似文献   

7.
A fully distributed and symmetric algorithm for solving the distributed termination problem is presented along with its correctness arguments. The algorithm does not make use of time-stamps and clock-synchronization and is very simple.  相似文献   

8.
The Paradigm compiler for distributed-memory multicomputers   总被引:1,自引:0,他引:1  
To harness the computational power of massively parallel distributed-memory multicomputers, users must write efficient software. This process is laborious because of the absence of global address space. The programmer must manually distribute computations and data across processors and explicitly manage communication. The Paradigm (PARAllelizing compiler for DIstributed-memory, General-purpose Multicomputers) project at the University of Illinois addresses this problem by developing automatic methods for the efficient parallelization of sequential programs. A unified approach efficiently supports regular and irregular computations using data and functional parallelism  相似文献   

9.
The video plus depth format has been widely-used in multi-view video systems. Therefore, low complexity depth video coding becomes very essential. For that, an efficient two-stage early SKIP mode termination (TESMT) algorithm is proposed in this paper. First, by using texture-depth correlation, our approach first checks whether current macroblock (MB) is motionless or slow-motion based on motion activity of corresponding region in texture video. If so, SKIP mode is selected as optimal mode and mode decision process is early terminated. Otherwise, our approach further checks whether the rate-distortion cost of SKIP mode is below an adaptive threshold, which is derived by exploiting spatial-temporal correlation between current MB and its adjacent MBs in depth video. Experimental results show that proposed algorithm significantly reduces computational complexity while keeping almost the same coding efficiency of depth video and quality of synthesized view, compared with exhaustive mode decision in multi-view video coding.  相似文献   

10.
A visualization programming environment for multicomputers   总被引:1,自引:0,他引:1  
The programming and run-time environment used for the authors' multicomputer visualization software are described. The particular approach to using multicomputers for scientific visualization provides a uniform interface to system and communications facilities and promotes modularity and code reuse. No breakthrough technology is involved; rather, a collection of methods that have been developed by others has been optimized for visual computing applications and unified into a system that is simple to use and easy to port to new hardware. The C language is used. Initial experience with the system has been good  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号