期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Equivalence of multistage interconnection networks

《Information Processing Letters》1987,26(1):45-50

In recent years, many multistage interconnection networks using 2 × 2 switching cells have been proposed for parallel architecture. Here we state a correct and easy graph characterization of all the networks topologically equivalent to the Omega, Flip, Baseline, Reverse Baseline, Indirect Binary Cube, and Modified Data Manipulator networks. 相似文献

2.

A Parallel Algorithm for Traffic Control Problems in Three-Stage Connecting Networks

《Journal of Parallel and Distributed Computing》1994,20(2):236-240

A parallel heuristic algorithm for traffic control problems in three-stage connecting networks is presented in this paper. A three-stage connecting network consists of an input crossbar switching stage, an intermediate crossbar switching stage, and an output crossbar switching stage. The goal of our algorithm is to quickly and efficiently find a conflict-free switching assignment for communication demands through the network. The algorithm requires n² × m processing elements for the network composed of n input/output switches and m intermediate switches, where it runs not only on a sequential machine, but also on a parallel machine with maximally n² × m processors. The algorithm was verified by 1100 simulation runs with the network size from 10² × 7 to 50² × 27. The simulation results show that the algorithm can find a solution in nearly constant time with n² × m processors. 相似文献

3.

The hierarchical Petersen network: a new interconnection network with fixed degree

Jung-Hyun Seo Jong-Seok Kim Hyung Jae Chang Hyeong-Ok Lee 《The Journal of supercomputing》2018,74(4):1636-1654

Network cost and fixed-degree characteristic for the graph are important factors to evaluate interconnection networks. In this paper, we propose hierarchical Petersen network (HPN) that is constructed in recursive and hierarchical structure based on a Petersen graph as a basic module. The degree of HPN(n) is 5, and HPN(n) has $10^n$ nodes and $2.5 \times 10^n$ edges. And we analyze its basic topological properties, routing algorithm, diameter, spanning tree, broadcasting algorithm and embedding. From the analysis, we prove that the diameter and network cost of HPN(n) are $3\log _{10}N-1$ and $15 \log _{10}N-1$, respectively, and it contains a spanning tree with the degree of 4. In addition, we propose link-disjoint one-to-all broadcasting algorithm and show that HPN(n) can be embedded into FP$_k$ with expansion 1, dilation 2k and congestion 4. For most of the fixed-degree networks proposed, network cost and diameter require $O(\sqrt{N})$ and the degree of the graph requires O(N). However, HPN(n) requires O(1) for the degree and $O(\log _{10}N)$ for both diameter and network cost. As a result, the suggested interconnection network in this paper is superior to current fixed-degree and hierarchical networks in terms of network cost, diameter and the degree of the graph. 相似文献

4.

A Novel Approach to Fast Discrete Fourier Transform

《Journal of Parallel and Distributed Computing》1998,54(1):48-58

Discrete Fourier transform (DFT) is an important tool in digital signal processing. In the present paper, we propose a novel approach to performing DFT. We transform DFT into a form expressed in discrete moments via a modular mapping and truncating Taylor series expansion. From this, we extend the use of our systolic array for fast computation of moments without any multiplications to one that computes DFT with only a few multiplications and without any evaluations of exponential functions. The multiplication number used in our method isO(Nlog₂ N/ log₂log₂ N) superior toO(Nlog₂ N) in FFT. The execution time of the systolic array is onlyO(Nlog₂ N/ log₂log₂ N) for 1-D DFT andO(N^k) fork-D DFT (k⩾2). The systolic implementation is a demonstration of the locality of dataflow in the algorithms and hence it implies an easy and potential hardware/VLSI realization. The approach is also applicable to DFT inverses. 相似文献

5.

L3: An FPGA-based multilayer maze routing accelerator

《Microprocessors and Microsystems》2005,29(2-3):87-97

This paper describes a multi-layer maze routing accelerator which uses a two-dimensional array of processing elements (PEs) implemented in an FPGA. Routing for an L-layer N×N grid is performed by an array of N×N PEs that time-multiplex each layer over the array. This accelerates the classic Lee Algorithm from O(L×d²) in software to O(L×d). Each PE can be implemented in 32 look up tables in a Xilinx Virtex-II FPGA, which makes possible routing arrays that are large enough to support detailed routing for VLSI. Cycle measurements show a speedup of 50–75× over a 2.54 GHz Pentium 4 for a 4-layer 8×8 array and 93× for a 4-layer 16×16 array. 相似文献

6.

The New Class of g-Chain Periodic Sorters

Ronald I Becker David Nassimi Yehoshua Perl 《Journal of Parallel and Distributed Computing》1998,54(2):1601

Aperiodic sorteris a sorting network that is a cascade of a number of identicalblocks, where outputiof each block is inputiof the next block. Previously, Dowd, Perl, Rudolph, and Saks introduced thebalancedmerging network, withN=2^kinputs/outputs and log Nstages of comparators. Using a very intricate proof, they showed that a cascade of log Nsuch blocks constitutes a sorting network. In this paper, we introduce a large class of merging networks with the same periodic property. This class contains 2^N/2−1networks, whereN=2^kis the number of inputs. The balanced merger is one network in this class. Other networks use fewer comparators. We provide a very simple and elegant correctness proof based on the recursive structure of the networks. 相似文献

7.

A novel constant degree and constant congestion DHT scheme for peer-to-peer networks 总被引：3，自引：0，他引：3

LI Dongsheng & LU Xicheng School of Computer National University of Defense Technology Changsha China 《中国科学F辑(英文版)》2005,48(4):421-436

1 Introduction and related work In recent years, peer-to-peer computing has attracted significant attention from both industry field and academic field[1-3]. The core component of many proposed peer-to- peer systems is the distributed hash table (DHT) schemes[4,5] that use a hash table-like interface to publish and look up data objects. Many proposed DHT schemes[6-15] are based on some traditional interconnection to- pology: Chord[6], Tapestry[7,8], Pastry[9] are based on hypercube topolog… 相似文献

8.

k-pairwise disjoint paths routing in perfect hierarchical hypercubes

Antoine Bossard Keiichi Kaneko 《The Journal of supercomputing》2014,67(2):485-495

Hierarchical hypercubes (HHC), also known as cube-connected cubes, have been introduced in the literature as an interconnection network for massively parallel systems. Effectively, they can connect a large number of nodes while retaining a small diameter and a low degree compared to a hypercube of the same size. Especially (2^m+m)-dimensional hierarchical hypercubes ( $\mathit {HHC}_{2^{m}+m}$ ), called perfect HHCs, are popular as they are symmetrical, which is a critical property when designing routing algorithms. In this paper, we describe an algorithm finding, in an $\mathit{HHC}_{2^{m}+m}$ , mutually node-disjoint paths connecting k=?(m+1)/2? pairs of distinct nodes. This problem is known as the k-pairwise disjoint-path routing problem and is one of the important routing problems when dealing with interconnection networks. In an $\mathit{HHC}_{2^{m}+m}$ , our algorithm finds paths of lengths at most 2^m+1+m(2^m+1+1)+4 in O(2^5m) time, where 2^m+1 is the diameter of an $\mathit{HHC}_{2^{m}+m}$ . Also, we have shown through an experiment that, in practice, the lengths of the generated paths are significantly lower than the worst-case theoretical estimations. 相似文献

9.

Analysis of Multi-Sort Algorithm on Multi-Mesh of Trees (MMT) architecture

Nitin Rakesh Nitin 《The Journal of supercomputing》2011,57(3):276-313

Various sorting algorithms using parallel architectures have been proposed in the search for more efficient results. This paper introduces the Multi-Sort Algorithm for Multi-Mesh of Trees (MMT) Architecture for N=n ⁴ elements with more efficient time complexity compared to previous architectures. The shear sort algorithm on Single Instruction Multiple Data (SIMD) mesh model requires $4\sqrt{N}+O\sqrt{N}$ time for sorting N elements, arranged on a $\sqrt{N}\times \sqrt{N}$ mesh, whereas Multi-Sort algorithm on the SIMD Multi-Mesh (MM) Architecture takes O(N ^1/4) time for sorting the same N elements, which proves that Multi-Sort is a better sorting approach. We have improved the time complexity of intrablock Sort. The Communication time complexity for 2D Sort in MM is O(n), whereas this time in MMT is O(log?n). The time complexity of compare–exchange step in MMT is same as that in MM, i.e., O(n). It has been found that the time complexity of the Multi-Sort on MMT has been improved as on Multi-Mesh architecture. 相似文献

10.

Computing the subset partial order for dense families of sets

Amr Elmasry 《Information Processing Letters》2009,109(18):1082-1086

We give an algorithm to compute the subset partial order (called the subset graph) for a family F of sets containing k sets with N elements in total and domain size n. Our algorithm requires O(nk²/logk) time and space on a Pointer Machine. When F is dense, i.e. N=Θ(nk), the algorithm requires O(N²/log²N) time and space. We give a construction for a dense family whose subset graph is of size Θ(N²/log²N), indicating the optimality of our algorithm for dense families. The subset graph can be dynamically maintained when F undergoes set insertions and deletions in O(nk/logk) time per update (that is sub-linear in N for the case of dense families). If we assume words of b?k bits, allow bits to be packed in words, and use bitwise operations, the above running time and space requirements can be reduced by a factor of blog(k/b+1)/logk and b²log(k/b+1)/logk respectively. 相似文献

11.

Sorting n Numbers on n × n Reconfigurable Meshes with Buses

《Journal of Parallel and Distributed Computing》1994,23(1):37-48

We show how column sort and rotate sort can be implemented on the different reconfigurable mesh with buses (RMB) architectures that have been proposed in the literature. On all of these proposed RMB architectures, we are able to sort n numbers on an n × n configuration in O(1) time. For the PARBUS RMB architecture, our column sort and rotate sort implementations are simpler than the O(1) sorting algorithms developed in Jang and Prasanna (International Parallel Processing Symposium, 1992) and Lin et al. (Proceedings of Ninth European Workshop on Parallel Computing, 1992). Furthermore, our sorting algorithms use fewer bus broadcasts. For the RMESH RMB architecture, our algorithms are the first to sort n numbers on an n × n configuration in O(1) time. We also observe that rotate sort can be implemented on N × N × ··· × N (k + 1)-dimensional RMB architectures to sort N^k elements in O(1) time. 相似文献

12.

The complexity of preemptive scheduling given interprocessor communication delays

《Information Processing Letters》1987,25(2):123-125

We discuss the problem of scheduling af set of independent tasks T, each t_i ϵ T of lenght ℓ_i ϵ Z⁺, on m identical processors. We allow preemption but assume a communication delay of time k ϵ N. Whenever a task is preempted from one processor to another, there must be a delay of at least k time units. We show that if k = 1, an optimal schedule can be found in polynomial time but if k ⩾ 2, the corresponding decision problem is NP-complete. 相似文献

13.

One-step t-fault diagnosis for hypermesh optical interconnection multiprocessor systems

Xingchang Xiaofan Min 《Journal of Systems and Software》2009,82(9):1491-1496

To maintain high reliability and availability, system-level diagnosis should be considered for the multiprocessor systems. The self-diagnosis problem of hypermesh, emerging potential optical interconnection networks for multiprocessor systems, is solved in this paper. We derive that the precise one-step diagnosability of kⁿ-hypermesh is n(k − 1). Based on the principle of cycle decomposition, a one-step t-fault diagnosis algorithm for kⁿ-hypermesh which runs in O(kⁿn(k − 1)) time also is described. 相似文献

14.

互连网络拓扑等价的图分析法 总被引：9，自引：1，他引：8

艾军曹明翠《计算机研究与发展》1994,31(3):29-33

提出了描述互连网络拓扑等价的图分析法。获得了全交叉网络与基准，逆基准，Ｏｍｅｇａ，ｆｌｉｐ，Ｓ＝Ｆ＝２ＳＷ榕树，简化数据变换等多级互连网络拓扑等价的逻辑名结构。阐明了用光学全交叉网络模拟实现上述网络的互连函数的原理及其多处理机，电信交换等领域的潜在应用。相似文献

15.

A New Topology with Odd Degree for Multiprocessor Systems

《Journal of Parallel and Distributed Computing》1996,39(1):87-94

A new topology for interconnection networks has been proposed. The underlying network graph hasN= 4ⁿnodes (n≥ 2) and isalmostregular with maximum degree 5 and diameter ≤ ⌊3/4 log₂N⌋ + 1. Algorithms for point-to-point routing and single node broadcast have also been developed. It has also been shown that various algorithms for real life applications, e.g., matrix transpose, matrix multiplication, finding the sum/average/maximum/minimum of a set of data elements and ASCEND/DESCEND types of algorithms can be efficiently implemented on this topology. Finally, the underlying idea of constructing this network has been generalized to define a family ofalmost regularodd degree graphs of maximum degree 2j+ 1, (j> 2) withN= (2j)ⁿnodes and diameter ⌊3/4 log_jN⌋ + 1. 相似文献

16.

Partitionability of k-Extra-Stage Omega Networks and an Optimal Task Migration Algorithm

《Journal of Parallel and Distributed Computing》2000,60(3):334-348

In a large-scale parallel computing system, partitionability is an important feature that allows a large system to be partitioned into smaller subsystems so that multiple tasks can be run independently on different subsystems. In such a system, a task run on a subsystem may need to be migrated to another subsystem if the subsystem is faulty, if load balance is needed, or if subsystem restructuring is desired. This paper studies the partitionability of k-extra-stage Omega networks and presents an optimal task migration algorithm for k-extra-stage Omega networks. 相似文献

17.

Block splittings for the conjugate gradient method

David Kratzer Seymour V. Parter Michael Steuerwalt 《Computers & Fluids》1983,11(4):255-279

We study the generalized conjugate gradient scheme, based on the k-line and k × k block Jacobi splittings A = M ? N, for solving two-dimensional parabolic and elliptic difference equations

AU = F ?

. Here A represents the difference operator ch^α ? h²Δ_h. Computational experiments suggest that the eigenvalues of K: = I ? M^?1N cluster, and that the cluster radii decrease as k or ch^α increases. As is well known, clustering improves convergence of the conjugate gradient iterates. We discuss computational results for k = 4, 8, 16, 32 and for ch^α = 0, h, 2. Moreover, we establish the number and size of eigenvalue clusters for the model problem. 相似文献

18.

A parallel routing algorithm on recursive cube of rings networks employing Hamiltonian circuit Latin square

Dongmin Choi 《Information Sciences》2008,178(6):1533-1541

Recursive cube of rings (RCR) networks [Y. Sun, P. Cheung, X. Lin, Recursive cube of rings: a new topology for interconnection networks, IEEE Trans. Parallel Dist. Syst. 11 (3) (2000) 275-286] can be widely used in the design and implementation of parallel processing architectures. In this paper, we investigate the routing of a message on RCR networks, that is a key to the performance of this network. We would like to transmit k+1 packets from a source node to a destination node simultaneously along paths on RCR networks, the ith packet will traverse along the ith path (1?i?k+1). In order for all packets to arrive at a destination node quickly and securely, the ith path must be node-disjoint from all other paths. For construction of these paths, employing Hamiltonian circuit Latin square (HCLS), we present O(n²) parallel routing algorithm on RCR networks. 相似文献

19.

A faster parameterized algorithm for set packing

Ioannis Koutis 《Information Processing Letters》2005,94(1):7-9

We present an efficient parameterized algorithm for the (k,t)-set packing problem, in which we are looking for a collection of k disjoint sets whose union consists of t elements. The complexity of the algorithm is O(²O(t)nNlogN). For the special case of sets of bounded size, this improves the O(k(ck)n) algorithm of Jia et al. [J. Algorithms 50 (1) (2004) 106]. 相似文献

20.

Ambiguity and Communication

Juraj Hromkovič Georg Schnitger 《Theory of Computing Systems》2011,48(3):517-534

The ambiguity of a nondeterministic finite automaton (NFA) N for input size n is the maximal number of accepting computations of N for inputs of size n. For every natural number k we construct a family $(L_{r}^{k}\;|\;r\in \mathbb{N})$ of languages which can be recognized by NFA’s with size k?poly(r) and ambiguity O(n ^k), but $L_{r}^{k}$ has only NFA’s with size exponential in r, if ambiguity o(n ^k) is required. In particular, a hierarchy for polynomial ambiguity is obtained, solving a long standing open problem (Ravikumar and Ibarra, SIAM J. Comput. 19:1263–1282, 1989, Leung, SIAM J. Comput. 27:1073–1082, 1998). 相似文献