期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Fast problem-size-independent parallel prefix circuits

Yen-Chun Lin Li-Ling HungAuthor Vitae 《Journal of Parallel and Distributed Computing》2009

A family of parallel algorithms solving the prefix problem on the combinational circuit model is presented. These prefix circuits are waist-size optimal with waist 1 (WSO-1). They are not only building blocks for constructing fast depth-size optimal prefix circuits, but also themselves fast problem-size-independent prefix circuits. When the problem size is greater than the circuit width, the presented prefix circuits may very much faster than any other prefix circuits of the same width, especially when the problem size is greater than or equal to twice the circuit width. The new prefix circuits are compared analytically with other representative prefix circuits to show how fast they are. They have the minimum depth and are the fastest among all WSO-1 prefix circuits of the same width and fan-out. Thus, they are better building blocks than other WSO-1 circuits for constructing fast depth-size optimal prefix circuits with the same fan-out. 相似文献

2.

A class of almost-optimal size-independent parallel prefix circuits

Hatem M. El-Boghdadi 《Journal of Parallel and Distributed Computing》2013

Prefix computation is one of the fundamental problems that can be used in many applications such as fast adders. Most proposed parallel prefix circuits assume that the circuit is of the same width as the input size. 相似文献

3.

Parallel construction of binary trees with near optimal weighted path length

D. G. Kirkpatrick T. Przytycka 《Algorithmica》1996,15(2):172-192

We present parallel algorithms to construct binary trees with almost optimal weighted path length. Specifically, assuming that weights are normalized (to sum up to one) and error refers to the (absolute) difference between the weighted path length of a given tree and the optimal tree with the same weights, we present anO (logn)-time andn(log lognl logn)-EREW-processor algorithm which constructs a tree with error less than 0.18, andO (k logn log^* n)-time andn-CREW-processor algorithm which produces a tree with error at most l/n ^k, and anO (k ² logn)-time andn ²-CREW-processor algorithm which produces a tree with error at most l/n ^k. As well, we describe two sequential algorithms, anO(kn)-time algorithm which produces a tree with error at most l/n ^k, and anO(kn)-time algorithm which produces a tree with error at most . The last two algorithms use different computation models.The first author's research was supported in part by NSERC Research Grant 3053. A part of this work was done while the second author was at the University of British Columbia. 相似文献

4.

Parallel strong orientation of an undirected graph

Mikhail J. Atallah 《Information Processing Letters》1984,18(1):37-39

A dominating set of an undirected graph G is a set D of nodes such that every node of G either is in D or is adjacent to some node of D. It is shown that the problem of finding a minimum cardinality dominating set is NP-complete for split graphs (a subclass of chordal graphs) and bipartite graphs. 相似文献

5.

Simulating Boolean Circuits on a DNA Computer 总被引：6，自引：0，他引：6

M. Ogihara A. Ray 《Algorithmica》1999,25(2-3):239-250

We demonstrate that DNA computers can simulate Boolean circuits with a small overhead. Boolean circuits embody the notion of massively parallel signal processing and are frequently encountered in many parallel algorithms. Many important problems such as sorting, integer arithmetic, and matrix multiplication are known to be computable by small size Boolean circuits much faster than by ordinary sequential digital computers. This paper shows that DNA chemistry allows one to simulate large semi-unbounded fan-in Boolean circuits with a logarithmic slowdown in computation time. Also, for the class NC¹ , the slowdown can be reduced to a constant. In this algorithm we have encoded the inputs, the Boolean AND gates, and the OR gates to DNA oligonucleotide sequences. We operate on the gates and the inputs by standard molecular techniques of sequence-specific annealing, ligation, separation by size, amplification, sequence-specific cleavage, and detection by size. Additional steps of amplification are not necessary for NC¹ circuits. The feasibility of the DNA algorithm has been successfully tested on a small circuit by actual biochemical experiments. Received May 29, 1997; revised February 15, 1998. 相似文献

6.

Revisiting parallel cyclic reduction and parallel prefix-based algorithms for block tridiagonal systems of equations

Sudip K. Seal Kalyan S. Perumalla Steven P. Hirshman 《Journal of Parallel and Distributed Computing》2013

Direct solvers based on prefix computation and cyclic reduction algorithms exploit the special structure of tridiagonal systems of equations to deliver better parallel performance compared to those designed for more general systems of equations. This performance advantage is even more pronounced for block tridiagonal systems. In this paper, we re-examine the performances of these two algorithms taking the effects of block size into account. Depending on the block size, the parameter space spanned by the number of block rows, size of the blocks and the processor count is shown to favor one or the other of the two algorithms. A critical block size that separates these two regions is shown to emerge and its dependence both on problem dependent parameters and on machine-specific constants is established. Empirical verification of these analytical findings is carried out on up to 2048 cores of a Cray XT4 system. 相似文献

7.

Parallel computation and conflicts in memory access

Luděk Kučera 《Information Processing Letters》1982,14(2):93-96

相似文献

8.

A New Class of Depth-Size Optimal Parallel Prefix Circuits

Lin Yen-Chun Shih Chao-Cheng 《The Journal of supercomputing》1999,14(1):39-52

Given n values x₁, x₂, ... ,x_n and an associative binary operation o, the prefix problem is to compute x₁ox₂o··· ox_i, 1in. Many combinational circuits for solving the prefix problem, called prefix circuits, have been designed. It has been proved that the size s(D(n)) and the depth d(D(n)) of an n-input prefix circuit D(n) satisfy the inequality d(D(n))+s(D(n))2n–2; thus, a prefix circuit is depth-size optimal if d(D(n))+s(D(n))=2n–2. In this paper, we construct a new depth-size optimal prefix circuit SL(n). In addition, we can build depth-size optimal prefix circuits whose depth can be any integer between d(SL(n)) and n–1. SL(n) has the same maximum fan-out lgn+1 as Snir's SN(n), but the depth of SL(n) is smaller; thus, SL(n) is faster. Compared with another optimal prefix circuit LYD(n), d(LYD(n))+2d(SL(n))d(LYD(n)). However, LYD(n) may have a fan-out of at most 2 lgn–2, and the fan-out of LYD(n) is greater than that of SL(n) for almost all n12. Because an operation node with greater fan-out occupies more chip area and is slower in VLSI implementation, in most cases, SL(n) needs less area and may be faster than LYD(n). Moreover, it is much easier to design SL(n) than LYD(n). 相似文献

9.

《International Journal of Parallel, Emergent and Distributed Systems》2012,27(4):321-327

Computation of maximal matching of a graph based on Depth First Search tree computation was introduced by Datta and Sen (Parallel Algorithms and Applications, 5, 1995, 161-164). They showed that the approach gives efficient parallel algorithms for maximal matching for graphs for which DFS tree can be efficiently computed. They also presented a parallel scheme to compute a maximal matching in O(T(n) log n) time using O(P(n)) number of processors, where T(n) and P(n) are the time and the number of processors required to compute DFS tree of a graph in parallel. We present here, an improved technique to compute maximal matching in parallel based on DFS tree computation. The algorithm takes O(T(n)) time and O(P(n)) processors. 相似文献

10.

《Optimization methods & software》2012,27(3):485-509

In this article, we perform a comparative study of different heuristics used to design combinational logic circuits. This study mainly emphasizes the use of local search hybridized with a genetic algorithm (GA) and the impact of introducing parallelism. Our results indicate that a hybridization of a GA with a local search algorithm (simulated annealing) is beneficial and that the use of parallelism not only introduces a speedup in the algorithms compared (as expected) but also allows us to improve the quality of the solutions found. 相似文献

11.

Parallel prefix computation on extended multi-mesh network

Prasanta K. Jana B. Damodara NaiduShailendra Kumar Monish AroraBhabani P. Sinha 《Information Processing Letters》2002,84(6):295-303

相似文献

12.

量子小波变换算法及线路实现

朱轩溢孙力《计算机应用与软件》2008,25(9)

随着小波理论研究的深入,以及小波分析在信号分析和图像处理等领域的广泛应用,小波分析在量子计算领域中也越来越受到重视.应用置换矩阵、W-H变换矩阵和量子傅立叶变换矩阵来对Haar小波及D(4)小波变换矩阵进行分解,给出其算法,然后得出其完整的量子逻辑线路图,最后分析其复杂度. 相似文献

13.

基于BDD的组合电路等价性检验

李绍荣徐玉婷《计算机科学》2007,34(3):293-294

本文分析了基于BDD的组合电路等价性检验;讨论了构造输出函数的二叉判定图BDD的不同方法,并分析了BDD间布尔操作的不同的算法的异同;然后给出了一种基于BDD的组合电路等价性检验方法。相似文献

14.

Monotone circuits for monotone weighted threshold functions 总被引：1，自引：0，他引：1

Amos Beimel Enav Weinreb 《Information Processing Letters》2006,97(1):12-18

Weighted threshold functions with positive weights are a natural generalization of unweighted threshold functions. These functions are clearly monotone. However, the naive way of computing them is adding the weights of the satisfied variables and checking if the sum is greater than the threshold; this algorithm is inherently non-monotone since addition is a non-monotone function. In this work we by-pass this addition step and construct a polynomial size logarithmic depth unbounded fan-in monotone circuit for every weighted threshold function, i.e., we show that weighted threshold functions are in mAC¹. (To the best of our knowledge, prior to our work no polynomial monotone circuits were known for weighted threshold functions.)Our monotone circuits are applicable for the cryptographic tool of secret sharing schemes. Using general results for compiling monotone circuits (Yao, 1989) and monotone formulae (Benaloh and Leichter, 1990) into secret sharing schemes, we get secret sharing schemes for every weighted threshold access structure. Specifically, we get: (1) information-theoretic secret sharing schemes where the size of each share is quasi-polynomial in the number of users, and (2) computational secret sharing schemes where the size of each share is polynomial in the number of users. 相似文献

15.

基于流水光总线阵列的快速矩阵运算

陈宏建陈崚秦玲徐晓华《计算机工程与应用》2003,39(27):73-76,79

矩阵运算是最重要的数值计算,基于流水光总线的可重构线性阵列系统(LARPBS)是一种建立在光总线上的并行高效计算模型。该文主要介绍LARPBS模型上的快速并行矩阵运算,从而使人们更加了解光总线计算模型及其优越性,为今后进一步研究光总线模型及其并行算法奠定基础。相似文献

16.

《Computers & Industrial Engineering》2014

The well-known one-dimensional Bin Packing Problem (BPP) of whose variants arise in many real life situations is a challenging NP-Hard combinatorial optimization problem. Metaheuristics are widely used optimization tools to find (near-) optimal solutions for solving large problem instances of BPP in reasonable running times. With this study, we propose a set of robust and scalable hybrid parallel algorithms that take advantage of parallel computation techniques, evolutionary grouping genetic metaheuristics, and bin-oriented heuristics to obtain solutions for large scale one-dimensional BPP instances. A total number of 1318 benchmark problems are examined with the proposed algorithms and it is shown that optimal solutions for 88.5% of these instances can be obtained with practical optimization times while solving the rest of the problems with no more than one extra bin. When the results are compared with the existing state-of-the-art heuristics, the developed parallel hybrid grouping genetic algorithms can be considered as one of the best one-dimensional BPP algorithms in terms of computation time and solution quality. 相似文献

17.

Constructing H4, a Fast Depth-Size Optimal Parallel Prefix Circuit

Lin Yen-Chun Hsu Yao-Hsien Liu Chun-Keng 《The Journal of supercomputing》2003,24(3):279-304

Given n values x ₁, x ₂,...,x _n and an associative binary operation , the prefix problem is to compute x ₁x ₂x _i, 1in. Prefix circuits are combinational circuits for solving the prefix problem. For any n-input prefix circuit D with depth d and size s, if d+s=2n–2, then D is depth-size optimal. In general, a prefix circuit with a small depth is faster than one with a large depth. For prefix circuits with the same depth, a prefix circuit with a smaller fan-out occupies less area and is faster in VLSI implementation. This paper is on constructing parallel prefix circuits that are depth-size optimal with small depth and small fan-out. We construct a depth-size optimal prefix circuit H4 with fan-out 4. It has the smallest depth among all known depth-size optimal prefix circuits with a constant fan-out; furthermore, when n136, its depth is less than, or equal to, those of all known depth-size optimal prefix circuits with unlimited fan-out. A size lower bound of prefix circuits is also derived. Some properties related to depth-size optimality and size optimality are introduced; they are used to prove that H4 is depth-size optimal. 相似文献

18.

An optimal parallel processor bound in strong orientation of an undirected graph

Yung Hyang Tsin 《Information Processing Letters》1985,20(3):143-146

相似文献

19.

Linear lower bounds on unbounded fan-in boolean circuits

Juraj Hromkovič 《Information Processing Letters》1985,21(2):71-74

A practical method for representing Turner Combinators is presented, which needs only O(n log n) space in the worst case for translating lambda expressions of length n. No precomputation is necessary in our translation, which should be contrasted with Burton's proposal. The runtime system can be implemented with virtually no essential change to Turner's reduction machine. 相似文献

20.

改进的乘幂适应度函数在遗传算法中的应用 总被引：1，自引：0，他引：1

杨水清杨加明孙超《计算机工程与应用》2014,(17):40-43,154

在遗传算法优化过程中,引导搜索的主要依据是适应度函数。通过评估常见的几种适应度函数,兼顾保持种群的多样性和算法的收敛性,由乘幂尺度变换,提出了一种改进的乘幂适应度函数。以三个典型的测试函数为例,在相同遗传操作和参数情况下,分别采用常见的与改进的适应度函数进行优化比较。结果表明,所改进的乘幂适应度函数能明显提高算法的收敛精度、收敛速度和收敛稳定性,对提高遗传算法的整体性能有重要的意义。相似文献