首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Distributed memory multiprocessor (DMMP) systems have gained much attention because their performance can be easily scaled up by increasing the number of processor-memory modules. The k-ary n-cube is the most popular interconnection network topology currently used in DMMPs. Wormhole routing is one of the most promising switching technology and has been used in many new generation multicomputers. Wormhole routing makes the communication latency insensitive to the network diameter and reduces the size of the channel buffer of each router. The concept of virtual channels and virtual networks are widely invented for deadlock-free design. A fully adaptive wormhole routing method for k-ary n-cubes has been proposed by Linder in 1991 [10]. Unfortunately, the need of 2n − 1 virtual networks makes it unreasonable. In this paper, we propose a virtual network system to support an adaptive, minimal and deadlock free routing in k-ary n-cubes. It uses only four virtual networks but can get a higher degree of adaptability and higher traffic capacity. Simulation results are presented to verify the performance.  相似文献   

2.
冯凯 《计算机应用》2017,37(9):2454-2456
为了度量发生故障时kn方体对其可匹配性的保持能力,通过剖析条件故障下使得kn方体中不存在完美匹配或几乎完美匹配所需故障集的构造,研究了条件故障下使得kn方体不可匹配所需的最小故障数。当k ≥ 4为偶数且n ≥ 2时,得出了kn方体这一容错性参数的精确值并对其所有相应的最小故障集进行了刻画;当k ≥ 3为奇数且n ≥ 2时,给出了该kn方体容错性参数的一个可达下界和一个可达上界。结果表明,选取k为奇数的kn方体作为底层互连网络拓扑设计的并行计算机系统在条件故障下对其可匹配性有良好的保持能力;进一步地,该系统在故障数不超过2n时仍是可匹配的,要使该系统不可匹配至多需要4n-3个故障元。  相似文献   

3.
冯凯  李婧 《计算机应用》2019,39(11):3323-3327
并行计算机系统功能的实现很大程度上依赖于系统互连网络的性能。为了精确度量以kn方体为底层拓扑结构的并行计算机系统的容错能力,研究了点故障模型下kn方体中k元(n-1)方体子网络的可靠性。当k ≥ 3且为奇数时,分别在固定划分模式和灵活划分模式下对kn方体中不同数目的k元(n-1)方体子网络保持无故障状态的平均失效时间进行了分析,并得出了这一子网络可靠性评估参数的计算公式。结果表明,当基于k为奇数的kn方体构建的并行计算机系统指派子网络执行用户任务时,在点故障模型下灵活划分模式相比固定划分模式有着更好的容错能力。  相似文献   

4.
In this paper, a general class of Boolean n-cube structures is investigated. The interconnection is based on a mixed radix number system which results in a variety of generalized Boolean n-cube structures for a given number of processors , where x and y are positive integers. A number of interesting properties of the network are revealed. By a constructive method, the node connectivity of this network is found. Finally, we show that the graph is super-λ and is an optimal reliable structure for interconnection networks.  相似文献   

5.
Optimal location query in road networks is a basic operation in the location intelligence applications. Given a set of clients and servers on a road network, the purpose of optimal location query is to obtain a location for a new server, so that a certain objective function calculated based on the locations of clients and servers is optimal. Existing works assume no labels for servers and that a client only visits the nearest server. These assumptions are not realistic and it renders the existing work not useful in many cases. In this paper, we relax these assumptions and consider the k nearest neighbours (KNN) of clients. We introduce the problem of KNN-based optimal location query (KOLQ) which considers the k nearest servers of clients and labeled servers. We also introduce a variant problem called relocation KOLQ (RKOLQ) which aims at relocating an existing server to an optimal location. Two main analysis algorithms are proposed for these problems. Extensive experiments on the real road networks illustrate the efficiency of our proposed solutions.  相似文献   

6.
Incomplete hypercubes are gaining increasing attention as one of the possible solutions for the limitation on the number of nodes in the hypercubes. Distributing, in which one node sends distinct messages to distinct nodes, and its reverse operation are frequently used operations on data parallel computation. In this paper, we propose two types of spanning trees in incomplete hypercubes (composed of an n-cube and a k-cube): the binomial hierarchical spanning tree-binomial HST, and the balanced hierarchical spanning tree-balanced HST, for distributing (and its reverse operation). Distributing algorithms based on a balanced HST take better advantage of overlap between communication ports, or have speedup up to k/2 or n/2 over that based on the binomial HST when concurrently communication on all ports are possible, from the view point of communication complexity. An algorithm to construct hierarchical spanning trees in general incomplete hypercubes, which consist of an arbitrary number of nodes, is also devised.  相似文献   

7.
We propose a mathematical model for fault-tolerant routing based on acyclic orientations, or acorns, of the underlying network G=(V,E). The acorn routing model applies routing tables that store the set of parent pointers associated with each out-neighborhood defined by the acorn. Unlike the standard single-parent sink-tree model, which is vulnerable to faults, the acorn model affords a full representation of the entire network and is able to dynamically route around faults. This fault tolerance is achieved when using the acorn model as a multi-tree generator for gathering data at a destination node, as well as an independent tree generator for global point-to-point communication. A fundamental fault-tolerant measure of the model is the capacity of an acorn, i.e., the largest integer k such that each vertex outside the neighborhood N(v) of the destination v has at least k parent pointers. A capacity-k acorn A to destination v is k-vertex fault-tolerant to v. More strongly, we show A supports a k independent sink-tree generator, i.e., the parent pointers of each vertex w VN(v) can be partitioned into k nonempty classes labeled 1,2,…,k such that any set of sink trees T1,T2,…,Tk are pairwise independent, where tree Ti is a sink tree generated by parent pointers labeled i together with the parent pointers into v. We present an linear time optimization algorithm for finding an acorn A of maximum capacity in graphs, based upon a minimax theorem. We also present efficient algorithms that label the parent pointers of capacity-k acorn A, yielding a k-independent sink tree generating scheme.  相似文献   

8.
In the long-lived M-renaming problem, N processes repeatedly acquire and release names ranging over {0, …, M − 1}, where M < N. It is assumed that at most k M processes concurrently request or hold names. Efficient solutions to the long-lived renaming problem can be used to improve the performance of applications in which processes repeatedly perform computations whose time complexity depends on the size of the name space containing the processes that participate concurrently. In this paper, we consider wait-free solutions to the long-lived M -renaming problem that use only read and write instructions in an asynchronous, shared-memory multiprocessor. A solution to long-lived renaming is fast if the time complexity of acquiring and releasing a name once is independent of N. We present a new fast, long-lived (k(k + 1)/2)renaming algorithm that significantly improves upon the time and space complexity of similar previous algorithms, while providing a much simpler solution. We also show that fast, long-lived (2k − 1)-renaming can be implemented with reads and writes. This result is optimal with respect to the size of the name space.  相似文献   

9.
In this paper we introduce two pattern classifiers for non-sparse data (i.e. data with overlapping class distributions) which use the optimal interpolative neural network (OI-net), derived by one of the authors based on a generalized Fock (GF) space formulation. We present a statistical pattern classifier operating as a two-stage algorithm. The first stage consists of a pre-processing operation involving a k-N N editing of the original training set T. The operation results in a new training set, Te, which in the second stage is classified by an OI-net constructed by the recursive least squares algorithm. We also propose a new data specific classifier which has an additional third computational stage, in which samples of the original training set are added to the network piece by piece until satisfactory classification results are obtained. During the computation process the training set is iteratively updated until the number of mis-classified samples is minimized. The performance of these two classifiers has been evaluated in some illustrative examples.  相似文献   

10.
Visual cryptography and (k,n)-visual secret sharing schemes were introduced by Naor and Shamir (Advances in Cryptology — Eurocrypt 94, Springer, Berlin, 1995, pp. 1–12). A sender wishing to transmit a secret message distributes n transparencies amongst n recipients, where the transparencies contain seemingly random pictures. A (k,n)-scheme achieves the following situation: If any k recipients stack their transparencies together, then a secret message is revealed visually. On the other hand, if only k−1 recipients stack their transparencies, or analyze them by any other means, they are not able to obtain any information about the secret message. The important parameters of a scheme are its contrast, i.e., the clarity with which the message becomes visible, and the number of subpixels needed to encode one pixel of the original picture. Naor and Shamir constructed (k,k)-schemes with contrast 2−(k−1). By an intricate result from Linial (Combinatorica 10 (1990) 349–365), they were also able to prove the optimality of these schemes. They also proved that for all fixed kn, there are (k,n)-schemes with contrast . For k=2,3,4 the contrast is approximately and . In this paper, we show that by solving a simple linear program, one is able to compute exactly the best contrast achievable in any (k,n)-scheme. The solution of the linear program also provides a representation of a corresponding scheme. For small k as well as for k=n, we are able to analytically solve the linear program. For k=2,3,4, we obtain that the optimal contrast is at least and . For k=n, we obtain a very simple proof of the optimality of Naor's and Shamir's (k,k)-schemes. In the case k=2, we are able to use a different approach via coding theory which allows us to prove an optimal tradeoff between the contrast and the number of subpixels.  相似文献   

11.
《Parallel Computing》1988,6(2):235-245
We investigate the use of a hypercube packet switching network as a shared memory server for vector multiprocessors. Using the generalization of a high performance switch node introduced in an earlier paper, we develop a packet switched memory server capable of providing high bandwidth vector access to a shared memory. The network exhibits adaptive behavior, absorbing conflicts as a vector operation proceeds, and delivers full vector bandwidth to all processors simultaneously.In addition to its vector performance, the hypercube has another feature that makes it attractive as a shared memory server. The memory words are not equidistant from the processors. A hierarchy of distances occurs. By taking advantage of this one can provide segments of fast access memory within the global shared memory environment. This makes the shared memory hypercube very promising as a general purpose parallel computer.  相似文献   

12.
In this paper, we consider the problem of recognizing whether a given network is a rectangular mesh. We present an efficient distributed algorithm with an O(N) message and time complexity, where N is the number of nodes in the network. This is an improvement of a previous algorithm presented in Mohan (1990) with a message complexity of O(N log N) and time complexity of O(N1.6). The proposed algorithm is constructive in nature and also assigns coordinates to the nodes of the network.  相似文献   

13.
We consider the iterative solution of large sparse linear systems of equations arising from elliptic and parabolic partial differential equations in two or three space dimensions. Specifically, we focus our attention on nonsymmetric systems of equations whose eigenvalues lie on both sides of the imaginary axis, or whose symmetric part is not positive definite. This system of equation is solved using a block Kaczmarz projection method with conjugate gradient acceleration. The algorithm has been designed with special emphasis on its suitability for multiprocessors. In the first part of the paper, we study the numerical properties of the algorithm and compare its performance with other algorithms such as the conjugate gradient method on the normal equations, and conjugate gradient-like schemes such as ORTHOMIN(k), GCR(k) and GMRES(k). We also study the effect of using various preconditioners with these methods. In the second part of the paper, we describe the implementation of our algorithm on the CRAY X-MP/48 multiprocessor, and study its behavior as the number of processors is increased.  相似文献   

14.
In many calculations, spectral discretization in space is coupled with a standard ordinary differential equation formula in time. To analyze the stability of such a combination, one would like simply to test whether the eigenvalues of the spatial discretization operator (appropriately scaled by the time step k) lie in the stability region for the o.d.e. formula, but it is well known that this kind of analysis is in general invalid. In the present paper we rehabilitate the use of stability regions by proving that a discrete linear multistep ‘method of lines’ approximation to a partial differential equation is Lax-stable, within a small algebraic factor, if and only if all of the -pseudo-eigenvalues of the spatial discretization operator lie within O() of the stability region as → 0. An -pseudo-eigenvalue of a matrix A is any number that is an eigenvalue of some matrix A + E with E ; our arguments make use of resolvents and are closely related to the Kreiss matrix theorem. As an application of our general result, we show that an explicit N-point Chebyshev collocation approximation of ut = −xux on [−1, 1] is Lax-stable if and only if the time step satisfies k = O(N−2), although eigenvalue analysis would suggest a much weaker restriction of the form k CN−1.  相似文献   

15.
Support vector ordinal regression (SVOR) is a recently proposed ordinal regression (OR) algorithm. Despite its theoretical and empirical success, the method has one major bottleneck, which is the high computational complexity. In this brief, we propose a both practical and theoretical guaranteed algorithm, block-quantized support vector ordinal regression (BQSVOR), where we approximate the kernel matrix K with [(K)tilde] that is composed of k 2 constant blocks. We provide detailed theoretical justification on the approximation accuracy of BQSVOR. Moreover, we prove theoretically that the OR problem with the block-quantized kernel matrix [(K)tilde] could be solved by first separating the data samples in the training set into k clusters with kernel k-means and then performing SVOR on the k cluster representatives. Hence, the algorithm leads to an optimization problem that scales only with the number of clusters, instead of the data set size. Finally, experiments on several real-world data sets support the previous analysis and demonstrate that BQSVOR improves the speed of SVOR significantly with guaranteed accuracy.  相似文献   

16.
This paper describes an end-to-end parallel communications scheme based on a vector routing algorithm (VRA) for ATM network control and management. An information string is partitioned into m parts, which are then coded into k > m parts and sent out on k separate subchannels to the receiver. When m of the k parts are received correctly, the original information can be reconstructed. Two desirable effects are achieved in the context of ATM traffic control: (1) the burstiness of the source traffic can be smoothed out by the partition process; (2) the quality of service in terms of error, loss and delay can be controlled using the number of redundant routes km as a control parameter. Our results show that VRA is especially suitable for services with highly bursty traffic. We argue that several network management issues, including reliability, evolution and integration, security, and administration and billing can be addressed in a simple manner using the VRA framework.  相似文献   

17.
This paper presents a new approach to implement fast barrier synchronization in wormhole k-ary n-cubes. The novelty lies in using multidestination messages instead of the traditional single destination messages. Two different multidestination worm types, gather and broadcasting, are introduced to implement the report and wake-up phases of barrier synchronization, respectively. Algorithms for complete and arbitrary set barrier synchronization are presented using these new worms. It is shown that complete barrier synchronization in a k-ary n-cube system with e-cube routing can be implemented with 2n communication start-ups as compared to 2nlog2k start-ups needed with unicast-based message passing. This leads to an asymptotic improvement by a factor of log2k. Simulation results for different system and architectural parameters indicate that the new framework can reduce barrier synchronization cost considerably compared to the unicast-based scheme. For arbitrary set barrier, an interesting trend is observed where the synchronization cost keeps on reducing beyond a certain number of participating nodes. The framework demonstrates potential for supporting fast barrier synchronization in large wormhole-routed systems.  相似文献   

18.
In this paper, a new recurrent neural network is proposed for solving convex quadratic programming (QP) problems. Compared with existing neural networks, the proposed one features global convergence property under weak conditions, low structural complexity, and no calculation of matrix inverse. It serves as a competitive alternative in the neural network family for solving linear or quadratic programming problems. In addition, it is found that by some variable substitution, the proposed network turns out to be an existing model for solving minimax problems. In this sense, it can be also viewed as a special case of the minimax neural network. Based on this scheme, a k-winners-take-all (k-WTA) network with O(n) complexity is designed, which is characterized by simple structure, global convergence, and capability to deal with some ill cases. Numerical simulations are provided to validate the theoretical results obtained. More importantly, the network design method proposed in this paper has great potential to inspire other competitive inventions along the same line.  相似文献   

19.
h-Out-of-k mutual exclusion is a generalization of the 1-mutual exclusion problem, where there are k units of shared resources and each process requests h (1hk) units at the same time. Though k-arbiter has been shown to be a quorum-based solution to this problem, quorums in k-arbiter are much larger than those in the 1-coterie for 1-mutual exclusion. Thus, the algorithm based on k-arbiter needs many messages. This paper introduces the new notion that each request uses different quorums depending on the number of units of its request. Based on the notion, this paper defines two (h,k)-arbiters for h-out-of-k mutual exclusion: a uniform (h,k)-arbiter and a (k+1)-cube (h,k)-arbiter. The quorums in each (h,k)-arbiter are not larger than the ones in the corresponding k-arbiter; consequently, it is more efficient to use (h,k)-arbiters than the k-arbiters. A uniform (h,k)-arbiter is a generalization of the majority coterie for 1-mutual exclusion. A (k+1)-cube (h,k)-arbiter is a generalization of square grid coterie for 1-mutual exclusion.  相似文献   

20.
Nested dissection is a very popular direct method for solving sparse linear systems that arise from finite difference and finite element methods. Worley and Schreiber [16] give a fine grain algorithm for a square array of processors. Their algorithm uses O(N2) processors, each with O(N) memory, to factor an N2 by N2 sparse matrix whose graphs is an N × N mesh. The efficiency of their method is between 1/46 and 1/12. George et al. [6] [8] give a medium grain algorithm for hypercube architecture, while George et al. [7] give an algorithm for shared memory machines. These papers present a column oriented approach which can exploit O(N) parallelism and yield efficiencies up to 50%. Lucas [11] also gives a column oriented scheme which achieves up to 75% efficiency and O(N) parallelism. In this paper, we present a medium to fine grain algorithm for a P × P array of processors with local memory. This algorithm can exploit up to O(N2) parallelism. The efficiency of the fine grain version is comparable to [16] while as a medium grain algorithm achieves about 49% efficiency. The strength of the method is due to three factors: its ability to pipeline much of the computation, overlapping computation and communication, and the use of level 3 BLAS like primitives. In addition to its high efficiency its memory requirement is optimal, only O(N2 log N/P2) words memory is needed per processor.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号