期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Efficient PRAM simulation on a distributed memory machine

R. M. Karp M. Luby F. Meyer auf der Heide 《Algorithmica》1996,16(4-5):517-542

We present algorithms for the randomized simulation of a shared memory machine (PRAM) on a Distributed Memory Machine (DMM). In a PRAM, memory conflicts occur only through concurrent access to the same cell, whereas the memory of a DMM is divided into modules, one for each processor, and concurrent accesses to the same module create a conflict. Thedelay of a simulation is the time needed to simulate a parallel memory access of the PRAM. Any general simulation of anm processor PRAM on ann processor DMM will necessarily have delay at leastm/n. A randomized simulation is calledtime-processor optimal if the delay isO(m/n) with high probability. Using a novel simulation scheme based on hashing we obtain a time-processor optimal simulation with delayO(log log(n) log*(n)). The best previous simulations use a simpler scheme based on hashing and have much larger delay: (log(n)/log log(n)) for the simulation of an n processor PRAM on ann processor DMM, and (log(n)) in the case where the simulation is time-processor optimal.Our simulations use several (two or three) hash functions to distribute the shared memory among the memory modules of the PRAM. The stochastic processes modeling the behavior of our algorithms and their analyses based on powerful classes of universal hash functions may be of independent interest.Research partially supported by NSF/DARPA Grant CCR-9005448. Work was done while at the University of California at Berkeley and the International Computer Science Institute, Berkeley, CA.Research partially supported by National Science Foundation Operating Grant CCR-9016468, National Science Foundation Operating Grant CCR-9304722, United States-Israel Binational Science Foundation Grant No. 89-00312, United States-Israel Binational Science Foundation Grant No. 92-00226, and ESPRIT BR Grant EC-US 030.Part of work was done during a visit at the International Computer Science Institute at Berkeley; supported in part by DFG-Forschergruppe Effiziente Nutzung massiv paralleler Systeme, Teilprojekt 4, and by the Esprit Basic Research Action Nr. 7141 (ALCOM II). 相似文献

2.

A regular scalable fault tolerant interconnection network for distributed processing

Wei Shi Pradip K. Srimani 《Parallel Computing》2001,27(14):1897-1919

Bounded degree networks like deBruijn graphs or wrapped butterfly networks are very important from VLSI implementation point of view as well as for applications where the computing nodes in the interconnection networks can have only a fixed number of I/O ports. One basic drawback of these networks is that they cannot provide a desired level of fault tolerance because of the bounded degree of the nodes. On the other hand, networks like hypercube (where degree of a node grows with the size of a network) can provide the desired fault tolerance but the design of a node becomes problematic for large networks. In their attempt to combine the best of the both worlds, authors in [IEEE Transactions on Parallel and Distributed Systems 4(9) (1993) 962] proposed hyper-deBruijn (HD) networks that have many additional features of logarithmic diameter, partitionability, embedding, etc. But, HD networks are not regular, are not optimally fault tolerant and the optimal routing is relatively complex. Our purpose in the present paper is to extend the concepts used in the above-mentioned reference to propose a new family of scalable network graphs that retain all the good features of HD networks and at the same time are regular and maximally fault tolerant; the optimal point to point routing algorithm is significantly simpler than that of the HD networks. We have developed some new interesting results on wrapped butterfly networks in the process. 相似文献

3.

Achieving speedups for APL on an SIMD distributed memory machine

Raymond Greenlaw Lawrence Snyder 《International journal of parallel programming》1990,19(2):111-127

The potential speedup for SIMD parallel implementations of APL programs is considered. Both analytical and (simulated) empirical studies are presented. The approach is to recognize that nearly 95% of the operators appearing in APL programs are either scalar primitive, reduction or indexing and so the performance of these operators gives a good estimate of the amount of speedup a full program might receive. Substantial speedups are demonstrated for these operators and the empirical evidence accords with the analytical estimates.This research has been funded by the Office of Naval Research Contract No. N00014-86-K-0264 and the National Science Foundation Grant No. DCR 8416878. 相似文献

4.

Multistage ring network: An interconnection network for large scale shared memory multiprocessors

《Journal of Systems Architecture》2000,46(9):765-778

Unidirectional ring-based networks are currently popular choices for high performance large scale shared memory multiprocessors. This class of networks is attractive for their simple hardware interfaces, high speed communication, wider data path, and easy addition of extra nodes. However, a single ring does not scale well due to the fixed bandwidth, and the hierarchical ring networks as a natural extension of a single ring show limited scalability due to their limited bandwidth near the root. In this paper we present a new interconnection network called the Multistage Ring Network (MRN). The MRN has a 2-level hierarchy of rings, and its interconnection of global rings forms a type of the multistage network. The architecture of the MRN is effective at diffusing the global traffic on the network to all global rings, and the bandwidth of the network increases proportionally with increases in the system size. Our results show that in a peak throughput, the MRN performs seven times better than the hierarchical ring network for system size of 1024. 相似文献

5.

A general purpose subroutine for fast fourier transform on a distributed memory parallel machine

A. Dubey M. Zubair C. E. Grosch 《Parallel Computing》1994,20(12)

One issue which is central in developing a general purpose FFT subroutine on a distributed memory parallel machine is the data distribution. It is possible that different users would like to use the FFT routine with different data distributions. Thus there is a need to design FFT schemes on distributed memory parallel machines which can support a variety of data distributions. In this paper we present an FFT implementation on a distributed memory parallel machine which works for a number of data distributions commonly encountered in scientific applications. We have also addressed the problem of rearranging the data after computing the FFT. We have evaluated the performance of our implementation on a distributed memory parallel machine, the Intel iPSC/860. 相似文献

6.

Limits on interconnection network performance 总被引：1，自引：0，他引：1

Agarwal A. 《Parallel and Distributed Systems, IEEE Transactions on》1991,2(4):398-412

The latency of direct networks is modeled, taking into account both switch and wire delays. A simple closed-form expression for contention in buffered, direct networks is derived and found to agree closely with simulations. The model includes the effects of packet size and communication locality. Network analysis under various constraints and under different workload parameters reveals that performance is highly sensitive to these constraints and workloads. A two-dimensional network is shown to have the lowest latency only when switch delays and network contention are ignored; three- or four-dimensional networks are favored otherwise. If communication locality exists, two-dimensional networks regain their advantage. Communication locality decreases both the base network latency and the network bandwidth requirements of applications. It is shown that a much larger fraction of the resulting performance improvement arises from the reduction in bandwidth requirements than from the decrease in latency 相似文献

7.

Bitonic sort on a chained-cubic tree interconnection network

Sherenaz W. Al-Haj Baddar Basel A. Mahafzah 《Journal of Parallel and Distributed Computing》2014

Bitonic sort is one of the fastest oblivious parallel sorting algorithms known so far. Due to its high modularity, bitonic sort can be mapped to different interconnection networks. In this paper, the bitonic sort algorithm is mapped to the chained-cubic tree (CCT) interconnection network. It is shown that the computation time of the bitonic sort on a CCT (BSCCT) algorithm is O((n/p)×log(np))

O ((n / p) \times log (n p))

and that the communication cost is O(plog²p)

O (p {log}^{2} p)

, assuming that n

n

keys are evenly distributed among p

p

processors that comprise a given CCT network. Simulation is implemented and used to assess the performance of the BSCCT algorithm in terms of computation time, communication cost, message delay, and key comparisons. Simulation results showed that the BSCCT algorithm achieves a speedup that is almost 12-fold relative to a bitonic sort on a single processor, when 1024 processors were used to sort 32M keys. 相似文献

8.

The hierarchical Petersen network: a new interconnection network with fixed degree

Jung-Hyun Seo Jong-Seok Kim Hyung Jae Chang Hyeong-Ok Lee 《The Journal of supercomputing》2018,74(4):1636-1654

Network cost and fixed-degree characteristic for the graph are important factors to evaluate interconnection networks. In this paper, we propose hierarchical Petersen network (HPN) that is constructed in recursive and hierarchical structure based on a Petersen graph as a basic module. The degree of HPN(n) is 5, and HPN(n) has \(10^n\) nodes and \(2.5 \times 10^n\) edges. And we analyze its basic topological properties, routing algorithm, diameter, spanning tree, broadcasting algorithm and embedding. From the analysis, we prove that the diameter and network cost of HPN(n) are \(3\log _{10}N-1\) and \(15 \log _{10}N-1\), respectively, and it contains a spanning tree with the degree of 4. In addition, we propose link-disjoint one-to-all broadcasting algorithm and show that HPN(n) can be embedded into FP\(_k\) with expansion 1, dilation 2k and congestion 4. For most of the fixed-degree networks proposed, network cost and diameter require \(O(\sqrt{N})\) and the degree of the graph requires O(N). However, HPN(n) requires O(1) for the degree and \(O(\log _{10}N)\) for both diameter and network cost. As a result, the suggested interconnection network in this paper is superior to current fixed-degree and hierarchical networks in terms of network cost, diameter and the degree of the graph. 相似文献

9.

Constant-time parallel algorithms for image labeling on areconfigurable network of processors

Alnuweiri H.M. 《Parallel and Distributed Systems, IEEE Transactions on》1994,5(3):320-326

A constant-time algorithm for labeling the connected components of an N×N image on a reconfigurable network of N³ processors is presented. The main contribution of the algorithm is a novel constant-time technique for determining the minimum-labeled PE in each component. The number of processors used by the algorithm can be reduced to N/sup 2+(1/d/), for any 1⩽d⩽log N, if O(d) time is allowed 相似文献

10.

Fault diagnosis in a Benes interconnection network

Das S. Chaudhuri A. 《Parallel and Distributed Systems, IEEE Transactions on》1998,9(7):700-703

Benes network, being a back-to-back connection of two Baseline networks, the method for fault diagnosis for the class of nonredundant networks, as elucidated in our previous work, can be directly mapped on the two nonredundant networks. The individual results from these two networks can be combined to construct a comprehensive algorithm for the Benes network to diagnose single fault 相似文献

11.

随机网络的连通率研究

《微型机与应用》2016,(19):56-59

随机图是一种简单并且可用于抽象现实社会多种实际系统的网络。与其他网络模型不同,随机图的构造方式决定其节点具有对等性,且网络中可能存在孤立节点和子图。对随机图尤其是其连通性的研究有助于更深入地了解具有随机连接特性及节点对等特性的真实网络。文章采用理论与仿真相结合的方法,重点研究随机图的连通性和随机图连通率的计算方法,揭示了随机图在演化过程中的形态变化,表明随机图中树结构的广泛存在。研究还发现,在巨大连通子图形成前,随机图的子图大小呈幂律分布。本研究结果为复杂网络相关的实证研究和性质复杂的网络相变态研究提供了理论依据。相似文献

12.

Operator schemes on a common and distributed memory

A. P. Ershov 《Cybernetics and Systems Analysis》1968,4(4):55-61

相似文献

13.

Fault-tolerant distributed shared memory on a broadcast-based architecture

Katsinis C. Hecht D. 《Parallel and Distributed Systems, IEEE Transactions on》2004,15(12):1082-1092

Due to advances in fiber-optics and VLSI technology, interconnection networks that allow multiple simultaneous broadcasts are becoming feasible. Distributed-shared-memory implementations on such networks promise high performance even for applications with small granularity. This paper presents the architecture of one such implementation, called the simultaneous optical multiprocessor exchange bus, and examines the performance of augmented DSM protocols that exploit the natural duplication of data to maintain a recovery memory in each processing node and provide basic fault tolerance. Simulation results show that the additional data duplication necessary to create fault-tolerant DSM causes no reduction in system performance during normal operation and eliminates most of the overhead at checkpoint creation. Under certain conditions, data blocks that are duplicated to maintain the recovery memory are utilized by the underlying DSM protocol, reducing network traffic, and increasing the processor utilization significantly. 相似文献

14.

Uniform load sharing on a hierarchical content delivery network interconnection model

Sayan Sen Sarma S. K. Setua 《Innovations in Systems and Software Engineering》2016,12(3):239-248

Content management system (CMS) is an infrastructure for efficient distribution, organization, and delivery of digital content. It is desirable that the content must be successfully delivered regardless of the end users location or attachment network. For the end to end delivery of content, a virtual open content delivery infrastructure is formed by interconnecting several CDNs. In this paper, we focus on content delivery network interconnection. An efficient and suitable to implement hierarchical CDNI architecture, named as HCDNI, is proposed to reduce the limitations of CDNIs. Next, a content distribution and redistribution scheme is proposed so that the searching time and the round trip time for the content delivery can be minimized. Next, we find a reliable and fault tolerant scheme for web server replica placement and content caching. Finally, analysis and simulation studies show that proposed algorithm results in a significant improvement in terms of data routing, path selection, content distribution and redistribution, load balancing and network scalability. 相似文献

15.

CGIN: a fault tolerant modified Gamma interconnection network 总被引：1，自引：0，他引：1

Po-Jen Chuang 《Parallel and Distributed Systems, IEEE Transactions on》1996,7(12):1301-1306

To improve the terminal reliability of the Gamma interconnection network (GIN), we consider altering its connecting patterns between stages to attain multiple disjoint paths between any source and destination pair. The new modified GIN, referred to as a CGIN with connecting patterns between stages exhibiting a cyclic feature, is able to tolerate any arbitrary single fault and to lift up terminal reliability accordingly. If several rows of switching elements are fabricated in one chip using the VLSI technology, a CGIN could lead to reduced cost because the pin count per chip decreases and the layout area taken by connections shrinks. To make routing and rerouting in the CGIN more efficient and simpler to implement, destination tag routing and rerouting is also provided 相似文献

16.

Perfect load balancing on the star interconnection network

N. Imani H. Sarbazi-Azad S. G. Akl 《The Journal of supercomputing》2007,41(3):269-286

In this paper, we use the regular distribution method to design a perfect load balancing algorithm for an n-star with a maximum error of 1 and a time complexity of 3n(n+1). This algorithm is based on the novel notion of leader trees. A second algorithm proposed in this paper as an enhancement to our first algorithm and uses an arbitrary spanning tree as the leader tree and has a worst time complexity of 2.25n ²−3n+0.75. We also discuss the issue of dynamically selecting the leader tree and hybrid load balancing algorithms in general. Furthermore, we present a hybrid algorithm for load balancing on the star interconnection network which benefits from a diffusion load balancing preprocessing phase and shows a smaller mean time complexity than our two first algorithms. 相似文献

17.

Convergence testing on a distributed network of processors

《国际计算机数学杂志》2012,89(2):357-378

The termination of iterative algorithms on a distributed network of transputers is an important issue with the increasing usage of parallel computers.

In this paper we analyse the computational and communication costs of performing the convergence tests on the solution of the Laplace Equation on a two dimensional region,.i.e., the unit square.

Finally a strategy of terminating the iteration without convergence testing is demonstrated. 相似文献

18.

Efficient data parallel algorithms for multidimensional array operations based on the EKMR scheme for distributed memory multicomputers

Chun-Yuan Lin Yeh-Ching Chung Jen-Shiuh Liu 《Parallel and Distributed Systems, IEEE Transactions on》2003,14(7):625-639

Array operations are useful in a large number of important scientific codes, such as molecular dynamics, finite element methods, climate modeling, atmosphere and ocean sciences, etc. In our previous work, we have proposed a scheme of extended Karnaugh map representation (EKMR) for multidimensional array representation. We have shown that sequential multidimensional array operation algorithms based on the EKMR scheme have better performance than those based on the traditional matrix representation (TMR) scheme. Since parallel multidimensional array operations have been an extensively investigated problem, we present efficient data parallel algorithms for multidimensional array operations based on the EKMR scheme for distributed memory multicomputers. In a data parallel programming paradigm, in general, we distribute array elements to processors based on various distribution schemes, do local computation in each processor, and collect computation results from each processor. Based on the row, column, and 2D mesh distribution schemes, we design data parallel algorithms for matrix-matrix addition and matrix-matrix multiplication array operations in both TMR and EKMR schemes for multidimensional arrays. We also design data parallel algorithms for six Fortran 90 array intrinsic functions: All, Maxval, Merge, Pack, Sum, and Cshift. We compare the time of the data distribution, the local computation, and the result collection phases of these array operations based on the TMR and the EKMR schemes. The experimental results show that algorithms based on the EKMR scheme outperform those based on the TMR scheme for all test cases. 相似文献

19.

BACnet网络与Internet互联的研究 总被引：4，自引：2，他引：4

刘棣景新幸莫榕《微计算机信息》2002,18(6):59-61

BACnet即“楼宇自控网络的数据通讯协议”，是一种使不同厂家生产的楼宇自动控制设备能够互相通信和共享信息的开放协议。本文在论述并分析现有的BACnet网络与Internet互联的两种方式后，提出了一种BACnet/ IP网关互联方法，简化并统一了BACnet网络与Internet互联。相似文献

20.

一种面向等级保护的多级安全域间可信互联方法

孙燕金舒原杨智殷丽华《信息安全与技术》2010,(8)

相似文献