期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Efficient nonblocking switching networks for interprocessorcommunications in multiprocessor systems

Fong-Chih Shao Yavuz Oruc A. 《Parallel and Distributed Systems, IEEE Transactions on》1995,6(2):132-141

The performance of a multiprocessor system depends heavily on its ability to provide conflict free paths among its processors. In this paper, we explore the possibility of using a nonblocking network with O(N log N) edges (crosspoints) to interconnect the processors of an N processor system, We combine Bassalygo and Pinsker's implicit design of strictly nonblocking networks with an explicit construction of expanders to obtain a strictly nonblocking network with -765.18N+352.8N log N edges and 2+log(N/5) depth. We present an efficient parallel algorithm for routing connection requests on this network and implement it on three parallel processor topologies. The implementation on a parallel processor whose processing elements are interconnected as in the Bassalygo-Pinsker network requires O(N log N) processing elements, O(N log N) interprocessor links and it takes O(log N) steps to route any single connection request where each step involves a small number (≈72) of bit-level operations. A contracted or folded version of the same implementation reduces the processing element count to O(N) without increasing the link count or the routing time. Finally, we establish that the same algorithm takes O(log³ N) steps on a perfect shuffle processor with O(N) processing elements. These results improve the crosspoint, depth and routing time complexities of the previously reported strictly nonblocking networks 相似文献

2.

Multiple Addition and Prefix Sum on a Linear Array with a Reconfigurable Pipelined Bus System

Amitava Datta 《The Journal of supercomputing》2004,29(3):303-317

We present several fast algorithms for multiple addition and prefix sum on the Linear Array with a Reconfigurable Pipelined Bus System (LARPBS), a recently proposed architecture based on optical buses. Our algorithm for adding N integers runs on an N log M-processor LARPBS in O(log* N) time, where log* N is the number of times logarithm has to be taken to reduce N below 1 and M is the largest integer in the input. Our addition algorithm improves the time complexity of several matrix multiplication algorithms proposed by Li, Pan and Zheng (IEEE Trans. Parallel and Distributed Systems, 9(8):705–720, 1998). We also present several fast algorithms for computing prefix sums of N integers on the LARPBS. For integers with bounded magnitude, our first algorithm for prefix sum computation runs in O(log log N) time using N processors and in O(1) time using N ¹⁺ processors, for < 1. For integers with unbounded magnitude, the first algorithm for multiple addition runs in O(log log N log* N) time using N log M processors, when M is the largest integer in the input. Our second algorithm for multiple addition runs in O(log* N) time using N ¹⁺ log M processors, for < 1. We also show suitable extensions of our algorithm for real numbers. 相似文献

3.

Conference key agreement protocol with non-interactive fault-tolerance over broadcast network

Jiin-Chiou Cheng Chi-Sung Laih 《International Journal of Information Security》2009,8(1):37-48

Most conventional conference key agreement protocols have not been concerned with a practical situation. There may exist some malicious conferees who attempt to block conference initiation for some purposes, e.g. commercial, political or military benefit. Instances where conference must be launched immediately due to emergency, efficient detection of malicious behavior would be needed. Recently, Tzeng (IEEE Trans. Comput. 51(4):373–379, 2002) proposed a fault-tolerant conference key agreement protocol to address the issue where a conference key can be established among conferees even though malicious conferees exist. However, his protocol might be complex and inefficient during fault-detection. In the case where a malicious conferee exists and a fault-tolerant mechanism is launched, complicated interactions between conferees will be required. In this paper, we introduce a novel strategy, where any malicious conferee may be identified and removed from the conferee list without any interaction. With such a non-interactive fault-tolerance, conferences could be established and started efficiently. A complete example of our protocol will be given to describe the fascinating fault-tolerance. We analyse the security of our protocol regarding four aspects, i.e. correctness, fault-tolerance, active attack and passive attack. The comparisons of performance between our protocol and that of Tzeng are also shown. As a whole, the advantage of our protocol is superior to that of Tzeng under the situation where malicious conferees exist.

Chi-Sung LaihEmail:

相似文献

4.

The Performance of greedy algorithms for the on-line steiner tree and related problems

J. Westbrook D. C. K. Yan 《Theory of Computing Systems》1995,28(5):451-468

We study the on-line Steiner tree problem on a general metric space. We show that the greedy on-line algorithm isO(log((d/z)s))-competitive, wheres is the number of regular nodes,d is the maximum metric distance between any two revealed nodes, andz is the optimal off-line cost. Our results refine the previous known bound [9] and show that AlgorithmSB of Bartalet al. [3] for the on-line file allocation problem isO(log logN)-competitive on anN-node hypercube or butterfly network. A lower bound of (log((d/z)s)) is shown to hold.We further consider the on-line generalized Steiner problem on a general metric space. We show that a class of lazy and greedy deterministic on-line algorithms areO(k· logk)-competitive and no on-line algorithm is better than (logk)-competitive, wherek is the number of distinct nodes that appear in the request sequence.For the on-line Steiner problem on a directed graph, it is shown that no deterministic on-line algorithm is better thans-competitive and the greedy on-line algorithm iss-competitive.A preliminary version of this paper has appeared in theProceedings of the Workshop on Algorithms and Data Structures, 1993, Montréal. The first author's research was partially supported by NSF Grant CCR-9009753, whilst that of the second author was partially supported by NSF Grant DDM-8909660 and a University Fellowship from the Graduate School, Yale University. 相似文献

5.

严格无阻塞多播三级Clos网的优化

刘燕君于璠鲍远律《小型微型计算机系统》2012,33(3):452-456

如何在严格无阻塞情况下保持最低的硬件代价,是多播三级Clos网设计中的一个重要问题.提出一种优化网络硬件代价的方法,分别给出了在没有多播受限和中间级多播受限两种情况下,严格无阻塞多播三级Clos网硬件代价的最优值.分析表明,优化后网络的硬件代价得到了有效降低,在某些情况下甚至低于广义无阻塞网.同时,与广义无阻塞网相比,该网络无需特定的路由算法就能始终保持严格无阻塞状态,在一定程度上降低了时间复杂度. 相似文献

6.

Parallel algorithms for routing in nonblocking networks

Geng Lin Nicholas Pippenger 《Theory of Computing Systems》1994,27(1):29-40

We construct nonblocking networks that are efficient not only as regards their cost and delay, but also as regards the time and space required to control them. In this paper we present the first simultaneous weakly optimal solutions for the explicit construction of nonblocking networks, the design of algorithms and data-structures. Weakly optimal is in the sense that all measures of complexity (size and depth of the network, time for the algorithm, space for the data-structure, and number of processor-time product) are within one or more logarithmic factors of their smallest possible values. In fact, we construct a scheme in which networks withn inputs andn outputs have sizeO(n(logn)²) and depthO(logn), and we present deterministic and randomized on-line parallel algorithms to establish and abolish routes dynamically in these networks. In particular, the deterministic algorithm usesO((logn)⁵) steps to process any number of transactions in parallel (with one processor per transaction), maintaining a data structure that useO(n(logn)²) words. 相似文献

7.

Analysis of Multi-Sort Algorithm on Multi-Mesh of Trees (MMT) architecture

Nitin Rakesh Nitin 《The Journal of supercomputing》2011,57(3):276-313

Various sorting algorithms using parallel architectures have been proposed in the search for more efficient results. This paper introduces the Multi-Sort Algorithm for Multi-Mesh of Trees (MMT) Architecture for N=n ⁴ elements with more efficient time complexity compared to previous architectures. The shear sort algorithm on Single Instruction Multiple Data (SIMD) mesh model requires \(4\sqrt{N}+O\sqrt{N}\) time for sorting N elements, arranged on a \(\sqrt{N}\times \sqrt{N}\) mesh, whereas Multi-Sort algorithm on the SIMD Multi-Mesh (MM) Architecture takes O(N ^1/4) time for sorting the same N elements, which proves that Multi-Sort is a better sorting approach. We have improved the time complexity of intrablock Sort. The Communication time complexity for 2D Sort in MM is O(n), whereas this time in MMT is O(log?n). The time complexity of compare–exchange step in MMT is same as that in MM, i.e., O(n). It has been found that the time complexity of the Multi-Sort on MMT has been improved as on Multi-Mesh architecture. 相似文献

8.

A Fast Direct Solver for a Class of Elliptic Partial Differential Equations 总被引：1，自引：0，他引：1

Per-Gunnar Martinsson 《Journal of scientific computing》2009,38(3):316-330

We describe a fast and robust method for solving the large sparse linear systems that arise upon the discretization of elliptic partial differential equations such as Laplace’s equation and the Helmholtz equation at low frequencies. While most existing fast schemes for this task rely on so called “iterative” solvers, the method described here solves the linear system directly (to within an arbitrary predefined accuracy). The method is described for the particular case of an operator defined on a square uniform grid, but can be generalized other geometries. For a grid containing N points, a single solve requires O(Nlog ² N) arithmetic operations and storage. Storing the information required to perform additional solves rapidly requires O(Nlog N) storage. The scheme is particularly efficient in situations involving domains that are loaded on the boundary only and where the solution is sought only on the boundary. In this environment, subsequent solves (after the first) can be performed in operations. The efficiency of the scheme is illustrated with numerical examples. For instance, a system of size 10⁶×10⁶ is directly solved to seven digits accuracy in four minutes on a 2.8 GHz P4 desktop PC. 相似文献

9.

On the Advice Complexity of the k-server Problem Under Sparse Metrics

Sushmita Gupta Shahin Kamali Alejandro López-Ortiz 《Theory of Computing Systems》2016,59(3):476-499

We consider the k-Server problem under the advice model of computation when the underlying metric space is sparse. On one side, we introduce Θ(1)-competitive algorithms for a wide range of sparse graphs. These algorithms require advice of (almost) linear size. We show that for graphs of size N and treewidth α, there is an online algorithm that receives O (n(log α + log log N))^* bits of advice and optimally serves any sequence of length n. We also prove that if a graph admits a system of μ collective tree (q, r)-spanners, then there is a (q + r)-competitive algorithm which requires O (n(log μ + log log N)) bits of advice. Among other results, this gives a 3-competitive algorithm for planar graphs, when provided with O (n log log N) bits of advice. On the other side, we prove that advice of size Ω(n) is required to obtain a 1-competitive algorithm for sequences of length n even for the 2-server problem on a path metric of size N ≥ 3. Through another lower bound argument, we show that at least \(\frac {n}{2}(\log \alpha - 1.22)\) bits of advice is required to obtain an optimal solution for metric spaces of treewidth α, where 4 ≤ α < 2k. 相似文献

10.

Identity-based fault-tolerant conference key agreement 总被引：1，自引：0，他引：1

Xun Yi 《Dependable and Secure Computing, IEEE Transactions on》2004,1(3):170-178

Lots of conference key agreement protocols have been suggested to secure computer network conference. Most of them operate only when all conferees are honest, but do not work when some conferees are malicious and attempt to delay or destruct the conference. Recently, Tzeng proposed a conference key agreement protocol with fault tolerance in terms that a common secret conference key among honest conferees can be established even if malicious conferees exist. In the case where a conferee can broadcast different messages in different subnetworks, Tzeng's protocol is vulnerable to a "different key attack" from malicious conferees. In addition, Tzeng's protocol requires each conferee to broadcast to the rest of the group and receive n - 1 message in a single round (where n stands for the number of conferees). Moreover, it has to handle n simultaneous broadcasts in one round. In this paper, we propose a fault-tolerant conference key agreement protocol, in which each conferee only needs to send one message to a "semitrusted" conference bridge and receive one broadcast message. Our protocol is an identity-based key agreement, built on elliptic curve cryptography. It is resistant to the different key attack from malicious conferees and needs less communication cost than Tzeng's protocol. 相似文献

11.

Pipelined search on coarse grained networks

Selim G. Akl Frank Dehne 《International journal of parallel programming》1989,18(5):359-364

The time complexity of searching a sorted list ofn elements in parallel on a coarse grained network of diameterD and consisting ofN processors (wheren may be much larger thanN) is studied. The worst case period and latency of a sequence of pipeline search operation are easity seen to be (logn–logN) and (D+logn–logN), respectively. Since forn=N ¹⁺⁽¹⁾ the worst-case period is (logn) (which can be achieved by a single processor), coarse-grained networks appear to be unsuitable for the search problem. By contrast, it is demonstrated using standard queuing theory techniques that a constant expected period can be achieved provided thatn=O(N2 ^N).This research was supported by the Natural Sciences and Engineering Research Council of Canada under Grants A3336 and A9173. 相似文献

12.

Backtracking Problem in the Traversal of an Unknown Directed Graph by a Finite Robot

Bourdonov I. B. 《Programming and Computer Software》2004,30(6):305-322

A covering path in a directed graph is a path passing through all vertices and arcs of the graph, with each arc being traversed only in the direction of its orientation. A covering path exists for any initial vertex only if the graph is strongly connected. The traversal of an unknown graph implies that the topology of the graph is not a priori known, and we learn it only in the course of traversing the graph. This is similar to the problem of traversing a maze by a robot in the case where the plan of the maze is not available. If the robot is a general-purpose computer without any limitations on the number of its states, then traversal algorithms with the estimate O(nm) are known, where n is the number of vertices and m is the number of arcs. If the number of states is finite, then this robot is a finite automaton. Such a robot is an analogue of the Turing machine, where the tape is replaced by a graph and the cells are assigned to the graph vertices and arcs. The selection of the arc that has not been traversed yet among those originating from the current vertex is determined by the order of the outgoing arcs, which is a priori specified for each vertex. The best known traversal algorithms for a finite robot are based on constructing the output directed spanning tree of the graph with the root at the initial vertex and traversing it with the aim to find all untraversed arcs. In doing so, we face the backtracking problem, which consists in searching for all vertices of the tree in the order inverse to their natural partial ordering, i.e., from the leaves to the root. Therefore, the upper estimate of the algorithms is different from the optimal estimate O(nm) by the number of steps required for the backtracking along the outgoing tree. The best known estimate O(nm + n ²loglogn) has been suggested by the author in the previous paper [1]. In this paper, a finite robot is suggested that performs a backtracking with the estimate O(n ²log*(n)). The function log* is defined as an integer solution of the inequality 1 log₂ ^log*(n) < 2, where log^t = log º log º ... º log (the superposition º is applied t – 1 times) is the tth compositional degree of the logarithm. The estimate O(nm + n ²log*(n)) for the covering path length is valid for any strongly connected graph for a certain (unfortunately, not arbitrary) order of the outgoing arcs. Interestingly, such an order of the arcs can be marked by symbols of the finite robot traversing the graph. Hence, there exists a robot that traverses the graph twice: first traversal with the estimate O(nm + n ²loglogn) and the second traversal with the estimate O(nm + n ²log*(n)). 相似文献

13.

A class of multistage conference switching networks for group communication

Yang Y. Wang J. 《Parallel and Distributed Systems, IEEE Transactions on》2004,15(3):228-243

There is a growing demand for network support for group applications, in which messages from one or more sender(s) are delivered to a large number of receivers. Here, we propose a network architecture for supporting a fundamental type of group communication, conferencing. A conference refers to a group of members in a network who communicate with each other within the group. We consider adopting a class of multistage networks, such as a baseline, an omega, or an indirect binary cube network, composed of switch modules with fan-in and fan-out capability for a conference network which supports multiple disjoint conferences. The key issue in designing a conference network is to determine the multiplicity of routing conflicts, which is the maximum number of conflict parties competing a single interstage link when multiple disjoint conferences simultaneously present in the network. Our results show that, for a network of size n /spl times/ n, the multiplicities of routing conflicts are small constants (between 2 and 4) for an omega network or an indirect binary cube network; while it can be as large as /spl radic/n/q + 1 for a baseline network, where q is the minimum allowable conference size. Thus, our design for conference networks is based on an omega network or an indirect binary cube network. We also develop fast self-routing algorithms for setting up routing paths in the newly designed conference networks. As can be seen, such an n /spl times/ n conference network has O(logn) routing time and communication delay and O(nlogn) hardware cost. The conference networks are superior to existing designs in terms of routing complexity, communication delay and hardware cost. The conference network proposed is rearrangeably nonblocking in general, and is strictly nonblocking under some conference service policy. It can be used in applications that require efficient or real-time group communication. 相似文献

14.

A priority queue in which initialization and queue operations takeO(loglogD) time

Donald B. Johnson 《Theory of Computing Systems》1981,15(1):295-309

Many computer algorithms have embedded in them a subalgorithm called a priority queue which produces on demand an element of extreme priority among elements in the queue. Queues on unrestricted priority domains have a running time of (nlogn) for sequences ofn queue operations. We describe a simple priority queue over the priority domain {1,,N} in which initialization, insertion, and deletion takeO(loglogD) time, whereD is the difference between the next lowest and next highest priority elements in the queue. In the case of initialization,D=(N). Finding a least element, greatest element, and the neighbor in priority order of some specified element take constant time. We also consider dynamic space allocation for the data structures used. Space can be allocated in blocks of size (N ^1/p), for small integerp. This research was supported by the National Science Foundation under grants MCS 77-21092 and MCS 80-002684. 相似文献

15.

Fast algorithms for bit-serial routing on a hypercube

William A. Aiello F. T. Leighton Bruce M. Maggs Mark Newman 《Theory of Computing Systems》1991,24(1):253-271

In this paper we describe anO(logN)-bit-step randomized algorithm for bit-serial message routing on a hypercube. The result is asymptotically optimal, and improves upon the best previously known algorithms by a logarithmic factor. The result also solves the problem of on-line circuit switching in anO(1)-dilated hypercube (i.e., the problem of establishing edge-disjoint paths between the nodes of the dilated hypercube for any one-to-one mapping).Our algorithm is adaptive and we show that this is necessary to achieve the logarithmic speedup. We generalize the Borodin-Hopcroft lower bound on oblivious routing by proving that any randomized oblivious algorithm on a polylogarithmic degree network requires at least (log² N/log logN) bit steps with high probability for almost all permutations.This research was supported by the Defense Advanced Research Projects Agency under Contracts N00014-87-K-825 and N00014-89-J-1988, the Air Force under Contract AFOSR-89-0271, and the Army under Contract DAAL-03-86-K-0171. This work was completed while the third and fourth authors were at the Laboratory for Computer Science, Massachusetts Institute of Technology. 相似文献

16.

Reduction Algorithms for Graphs of Small Treewidth

Hans L. Bodlaender Babette van Antwerpen-de Fluiter 《Information and Computation》2001,167(2):86

This paper presents a number of new ideas and results on graph reduction applied to graphs of bounded treewidth. S. Arnborg, B. Courcelle, A. Proskurowski, and D. Seese (J. Assoc. Comput. Mach.40, 1134–1164 (1993)) have shown that many decision problems on graphs can be solved in linear time on graphs of bounded treewidth, using a finite set of reduction rules. These algorithms can be used to solve problems on graphs of bounded treewidth without the need to obtain a tree decomposition of the input graph first. We show that the reduction method can be extended to solve the construction variants of many decision problems on graphs of bounded treewidth, including all problems definable in monadic second order logic. We also show that a variant of these reduction algorithms can be used to solve (constructive) optimization problems in O(n) time. For example, optimization and construction variants of I S and H C N can be solved in this way on graphs of small treewidth. Additionally, we show that the results of H. L. Bodlaender and T. Hagerup (SIAM J. Comput.27, 1725–1746 (1998)) can be applied to our reduction algorithms, which results in parallel reduction algorithms that use O(n) operations and O(log n log* n) time on an EREW PRAM, or O(log n) time on a CRCW PRAM. 相似文献

17.

Scalable and Practical Nonblocking Switching Networks

下载免费PDF全文

Si-Qing Zheng Ashwin Gumaste 《计算机科学技术学报》2006,21(4):466-475

Large-scale strictly nonblocking （SNB） and wide-sense nonblocking （WSNB） networks may be infeasible due to their high cost. In contrast, rearrangeable nonblocking （RNB） networks are more scalable because of their much lower cost. However, RNB networks are not suitable for circuit switching. In this paper, the concept of virtual nonblockingness is introduced. It is shown that a virtual nonblocking （VNB） network functions like an SNB or WSNB network, but it is constructed with the cost of an RNB network. The results indicate that for large-scale circuit switching applications, it is only needed to build VNB networks. 相似文献

18.

Parallel solution of recurrences on a tree machine

Roy P. Pargas 《International journal of parallel programming》1984,13(4):251-277

The recurrencex _o =a _o x _i =a _i+b _i x _i–1,i = 1, 2,...,n–1 requiresO(n) operations on a sequential computer. Elegant parallel solutions exist, however, that reduce the complexity toO(logN) usingNn processors. This paper discusses one such solution, designed for a tree-structured network of processors.A tree structure is ideal for solving recurrences. It takes exactly one sweep up and down the tree to solve any of several classes of recurrences, thus guaranteeing a solution inO(logN) time for a tree withNn leaf nodes. Ifn exceedsN, the algorithm efficiently pipelines the operation and solves the recurrence inO(n/N + logN) time. 相似文献

19.

Faster Deterministic Communication in Radio Networks

Ferdinando Cicalese Fredrik Manne Qin Xin 《Algorithmica》2009,54(2):226-242

We study the communication primitives of broadcasting (one-to-all communication) and gossiping (all-to-all communication) in known topology radio networks, i.e., where for each primitive the schedule of transmissions is precomputed based on full knowledge about the size and the topology of the network. We show that gossiping can be completed in time units in any radio network of size n, diameter D, and maximum degree Δ=Ω(log n). This is an almost optimal schedule in the sense that there exists a radio network topology, specifically a Δ-regular tree, in which the radio gossiping cannot be completed in less than units of time. Moreover, we show a schedule for the broadcast task. Both our transmission schemes significantly improve upon the currently best known schedules by Gąsieniec, Peleg, and Xin (Proceedings of the 24th Annual ACM SIGACT-SIGOPS PODC, pp. 129–137, 2005), i.e., a O(D+Δlog n) time schedule for gossiping and a D+O(log ³ n) time schedule for broadcast. Our broadcasting schedule also improves, for large D, a very recent O(D+log ² n) time broadcasting schedule by Kowalski and Pelc. A preliminary version of this paper appeared in the proceedings of ISAAC’06. F. Cicalese supported by the Sofja Kovalevskaja Award 2004 of the Alexander von Humboldt Stiftung. F. Manne and Q. Xin supported by the Research Council of Norway through the SPECTRUM project. 相似文献

20.

Parallel processing can be harmful: The unusual behavior of interpolation search

Dan E. Willard John H. Reif 《Information and Computation》1989,81(3)

Several articles have noted the usefulness of a retrieval algorithm called sequential interpolation search, and Yao and Yao have proven a lower bound log log N − O(1), showing this algorithm is actually optimal up to an additive constant on unindexed files of size N generated by the uniform probability distribution. We generalize the latter to show log log N − log log P − O(1) lower bounds the complexity of any retrieval algorithm with P parallel processors for searching an unindexed file of size N. This result is surprising because we also show how to obtain an upper bound that matches the lower bound up to an additive constant with a procedure that actually uses no parallel processing outside its last iteration (at which time our proposal turns on P processors in parallel). Our first theorem therefore states that parallel processing before the literally last iteration in the search of an unindexed ordered file has nearly no usefulness. Two further surprising facts are that the preceding result holds even when communication between the parallel processing units involves no delay and that the parallel algorithms are actually inherently slower than their sequential counterparts when each invocation of the SIMD machine invokes a communication step with any type of nonzero delay. The presentation in the first two chapters of this paper is quite informal, so that the reader can quickly grasp the underlying intuition. 相似文献