期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Parallel general prefix computations with geometric,algebraic, and other applications

Frederick Springsteel Ivan Stojmenović 《International journal of parallel programming》1989,18(6):485-503

We introduce a generic problem component that captures the most common, difficult kernel of many problems. This kernel involves general prefix computations (GPC). GPC's lower bound complexity of (n logn) time is established, and we give optimal solutions on the sequential model inO(n logn) time, on the CREW PRAM model inO(logn) time, on the BSR (broadcasting with selective reduction) model in constant time, and on mesh-connected computers inO(n) time, all withn processors, plus anO(log² n) time solution on the hypercube model. We show that GPC techniques can be applied to a wide variety of geometric (point set and tree) problems, including triangulation of point sets, two-set dominance counting, ECDF searching, finding two-and three-dimensional maximal points, the reconstruction of trees from their traversals, counting inversions in a permutation, and matching parentheses.work partially supported by NSF IRI/8709726work partially supported by NSERC. 相似文献

2.

The k-ary n-direct s-indirect family of topologies for large-scale interconnection networks

Roberto Peñaranda Crispín Gómez María E. Gómez Pedro López Jose Duato 《The Journal of supercomputing》2016,72(3):1035-1062

相似文献

3.

Parallel computations on graphs

E. A. Ivanov V. P. Shevchenko 《Cybernetics and Systems Analysis》1984,20(3):418-425

相似文献

4.

基于压缩子空间对齐的多核聚类算法

欧琦媛祝恩《计算机工程与科学》2021,42(10):1730-1735

近年来,多核聚类（MKC）在融合多源信息以提高聚类性能方面取得了显著进展。但是,以n表示样本数,O(n2)内存消耗和On3计算消耗限制了这些方法的实用性。重新设计了基于子空间分割的MKC公式,从而将其内存和计算复杂度分别降低到O(n)和O(n2)。在该算法（基于压缩子空间对齐的多核聚类算法CSA-MKC）中,通过对部分数据采样来重建整个数据集。具体而言,在该算法中,在信息融合过程中同时学习了共识采样矩阵,从而使生成的锚点集更适合于跨不同视图的数据重建。因此,改进了重构矩阵的判别性,并增强了聚类性能。此外,该算法易于并行化,通过GPU加速,在6个数据集上进行了测试,在时间上,该算法是数据规模的平方复杂度,在性能上,优于目前的先进算法。相似文献

5.

Parallel prefix computation on extended multi-mesh network

Prasanta K. Jana B. Damodara NaiduShailendra Kumar Monish AroraBhabani P. Sinha 《Information Processing Letters》2002,84(6):295-303

相似文献

6.

Optimal computation of prefix sums on a binary tree of processors

Henk Meijer Selim G. Akl 《International journal of parallel programming》1987,16(2):127-136

Givenn numbersa ₀,a ₁,...,a _n–1, it is required to compute all sums of the forma ₀+a ₁+...+a _i, fori=0, 1,...,n–1. This problem arises in many applications and is trivial to solve sequentially in O(n) time. Besides its practical importance, the problem gains an additional theoretical interest in parallel computation. A technique known asrecursive doubling allows all sums to be computed in O(logn) time on a model of computation wheren processors communicate through aninverse perfect suffle interconnection network. In this paper we show how the problem can be solved on a simple network, namely abinary tree of processors. In addition, we show how to extend our solution to obtain an optimal-cost algorithm. The algorithm usesp processors and runs in O((n/p)+logp) time, for a cost of O(n+p logp). This cost is optimal whenp logp=O(n). Finally, two applications of our results are illustrated, namely job scheduling with deadlines and the knapsack problem.This work was supported by the Natural Sciences and Engineering Research Council of Canada under Grants A0282 and A3336. 相似文献

7.

Bitonic sort on a chained-cubic tree interconnection network

Sherenaz W. Al-Haj Baddar Basel A. Mahafzah 《Journal of Parallel and Distributed Computing》2014

Bitonic sort is one of the fastest oblivious parallel sorting algorithms known so far. Due to its high modularity, bitonic sort can be mapped to different interconnection networks. In this paper, the bitonic sort algorithm is mapped to the chained-cubic tree (CCT) interconnection network. It is shown that the computation time of the bitonic sort on a CCT (BSCCT) algorithm is O((n/p)×log(np))

O ((n / p) \times log (n p))

and that the communication cost is O(plog²p)

O (p {log}^{2} p)

, assuming that n

n

keys are evenly distributed among p

p

processors that comprise a given CCT network. Simulation is implemented and used to assess the performance of the BSCCT algorithm in terms of computation time, communication cost, message delay, and key comparisons. Simulation results showed that the BSCCT algorithm achieves a speedup that is almost 12-fold relative to a bitonic sort on a single processor, when 1024 processors were used to sort 32M keys. 相似文献

8.

Parallel computations and committee constructions

V. D. Mazurov M. Yu. Khachai 《Automation and Remote Control》2007,68(5):912-921

The paper reviewed the results bearing out the deep-seated relation between the parallel computations and learning procedures for the laminated neural networks one of whose formalizations is represented by the theory of committee constructions. Additionally, consideration was given to two combinatorial problems concerned with learning pattern recognition in the class of affine committees—the problem of verifying existence of a three-element affine separating committee and that of element-minimal affine separating committee. The first problem was shown to be N P-complete, whereas the second problem is N P-hard and does not belong to the Apx class. 相似文献

9.

Formation stability analysis of unmanned multi-vehicles under interconnection topologies

Aolei Yang Wasif Naeem Minrui Fei 《International journal of control》2013,86(4):754-767

In this paper, the overall formation stability of an unmanned multi-vehicle is mathematically presented under interconnection topologies. A novel definition of formation error is first given and followed by the proposed formation stability hypothesis. Based on this hypothesis, a unique extension–decomposition–aggregation scheme is then employed to support the stability analysis for the overall multi-vehicle formation under a mesh topology. It is proved that the overall formation control system consisting of N number of nonlinear vehicles is not only asymptotically stable, but also exponentially stable in the sense of Lyapunov within a neighbourhood of the desired formation. This technique is shown to be applicable for a mesh topology but is equally applicable for other topologies. A simulation study of the formation manoeuvre of multiple Aerosonde UAVs (unmanned aerial vehicles), in 3-D space, is finally carried out verifying the achieved formation stability result. 相似文献

10.

Parallel microprograms and infinite computations

S. M. Achasova 《Cybernetics and Systems Analysis》1987,23(2):202-206

相似文献

11.

A class of hierarchical graphs as topologies for interconnection networks

Pao-Lien Lai Hong-Chun Hsu Chang-Hsiung Tsai Iain A. Stewart 《Theoretical computer science》2010,411(31-33):2912-2924

相似文献

12.

Symposium on Parallel computations and their impact on mechanics

《Computer Methods in Applied Mechanics and Engineering》1987,63(3):318

相似文献

13.

Parallel prefix operations on GPU: tridiagonal system solvers and scan operators

Diéguez Adrián P. Amor Margarita Doallo Ramón 《The Journal of supercomputing》2019,75(3):1510-1523

The Journal of Supercomputing - Modern GPUs can achieve high computing power at low cost, but still requires much time and effort. Tridiagonal system and scan solvers are one example of widely used... 相似文献

14.

Target containment control of multi-agent systems with random switching interconnection topologies

Youcheng Lou Yiguang Hong 《Automatica》2012,48(5):879-885

In this paper, the distributed containment control is considered for a second-order multi-agent system guided by multiple leaders with random switching topologies. The multi-leader control problem is investigated via a combination of convex analysis and stochastic process. The interaction topology between agents is described by a continuous-time irreducible Markov chain. A necessary and sufficient condition is obtained to make all the mobile agents almost surely asymptotically converge to the static convex leader set. Moreover, conditions on the tracking estimation are provided for the convex target set determined by moving multiple leaders. 相似文献

15.

Parallel solution of recurrences on a tree machine

Roy P. Pargas 《International journal of parallel programming》1984,13(4):251-277

The recurrencex _o =a _o x _i =a _i+b _i x _i–1,i = 1, 2,...,n–1 requiresO(n) operations on a sequential computer. Elegant parallel solutions exist, however, that reduce the complexity toO(logN) usingNn processors. This paper discusses one such solution, designed for a tree-structured network of processors.A tree structure is ideal for solving recurrences. It takes exactly one sweep up and down the tree to solve any of several classes of recurrences, thus guaranteeing a solution inO(logN) time for a tree withNn leaf nodes. Ifn exceedsN, the algorithm efficiently pipelines the operation and solves the recurrence inO(n/N + logN) time. 相似文献

16.

A family of codes commutatively equivalent to prefix codes

S. Mauceri A. Restivo 《Information Processing Letters》1981,12(1):1-4

相似文献

17.

A family of parallel prefix algorithms embedded in networks

Takesue M. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(10):1179-1184

This paper presents a family of algorithms for producing, from (υ₀, υ₁, ..., υ_n-1), all initial prefixes x_i=υ₀&thetas;υ₁&thetas;···&thetas;υ_i (i=0, 1, ..., n-1) in parallel in interconnection networks such as the omega network and the hypercube, where &thetas; is an associative binary operator. Each algorithm can be embedded in the switches and interconnections of the network, and can be executed in O((log₂ r+1) log_r n) time steps provided that the network connecting n processors is constructed by using an r×r switch, and that parallelism within as well as among individual switches is exploited. The objective of these algorithms is to attain a communication pattern that fits the topology of the network. One type of network can be made equivalent to, or can be embedded in, another type of network, so a family of algorithms can be derived from one basic algorithm. In the basic algorithm, every processor p_i upward multicasts υ_i to processors p_k (k=i+1, i+2, ..., n - 1). En route to p_i, υ_j (j=0, 1, ..., i - 1) are combined in the switches to produce the (i - 1)th initial prefix x_i-1 that is received by p_i, which can then compute the ith initial prefix x_i=x_i-1&thetas;υ_i 相似文献

18.

Parallel radiation dose computations with GENOCOP III on GPUs

Moreno J. J. Miroforidis J. Filatovas E. Kaliszewski I. Garzón E. M. 《The Journal of supercomputing》2021,77(1):66-76

The Journal of Supercomputing - In radiotherapy planning, which involves optimization, efforts to produce better (more effective and at the same time with lesser adverse effects) patient radiation... 相似文献

19.

Cyclic-cubes: a new family of interconnection networks of evenfixed-degrees

Ada Wai-Chee Fu Siu-Cheung Chau 《Parallel and Distributed Systems, IEEE Transactions on》1998,9(12):1253-1268

We introduce a new family of interconnection networks that are Cayley graphs with fixed degrees of any even number greater than or equal to four. We call the proposed graphs cyclic-cubes because contracting some cycles in such a graph results in a generalized hypercube. These Cayley graphs have optimal fault tolerance and logarithmic diameters. For comparable number of nodes, a cyclic-cube can have a diameter smaller than previously known fixed-degree networks. The proposed graphs can adopt an optimum routing algorithm known for one of its subfamilies of Cayley graphs. We also show that a graph in the new family has a Hamiltonian cycle and, hence, there is an embedding of a ring. Embedding of meshes and hypercubes are also discussed 相似文献

20.

Controlling the data space of tree structured computations

I. Gottlieb B. Obreni 《Information and Computation》2003,187(2):246-276

We study the problem of scheduling a parallel computation so as to minimize the maximum number of data items extant at any point in the execution. Computations are expressed as directed graphs, where nodes represent primitive operations and arcs represent data dependences. The result of an operation is extant after the operation executes and until all immediate successors have begun execution. Our goal is to schedule computations so as to minimize both the maximum space required for extant data and the overall completion time.The classical problem of multiprocessor scheduling with precedence constraints is a special case of our problem, obtained by disregarding the data-space constraint. This special case is NP-complete for general graphs; a time-optimal multiprocessor scheduling algorithm is known only for the class of arbitrary trees. For this same class of arbitrary trees we present a multiprocesssor scheduling algorithm where the completion time is optimal within a constant factor, while the data-space size exceeds the optimal by a factor not greater than the number of processors.For an arbitrary n-node precedence tree T of in-degree Δ, we present:

(1)an algorithm for evaluating the lower bound on the size of data space required for executing T, regardless of the completion time or number of processors;
(2)a proof that the lower bound of Part 1 may be as large as (Δ−1)lgn but not larger;
(3)a single-processor schedule that executes T in time that equals the optimal, while creating the data space of size equal to the lower bound of Part 1;
(4)an ω-processor schedule that executes T in time not exceeding three times the optimal, while creating the data space of size that exceeds the lower bound of Part 1 by a factor not greater than ω.
(5)a proof that for every number of processors ω and for every 0<ε1, there exist infinitely many trees such that every ω-processor schedule that executes any of these trees in time not exceeding (2−ε) times the optimal requires a token space as large as that created by the schedule of Part 4, while the schedule of Part 4 executes every such tree in optimal time.

The family of complete binary trees provides an example where our schedule achieves an exponential improvement in the size of the data space, compared to that of the classical time-optimal schedule. 相似文献