期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A new factorization of the mass matrix for optimal serial and parallel calculation of multibody dynamics

Amir Fijany Roy Featherstone 《Multibody System Dynamics》2013,29(2):169-187

This paper describes a new factorization of the inverse of the joint-space inertia matrix M. In this factorization, M ^?1 is directly obtained as the product of a set of sparse matrices wherein, for a serial chain, only the inversion of a block-tridiagonal matrix is needed. In other words, this factorization reduces the inversion of a dense matrix to that of a block-tridiagonal one. As a result, this factorization leads to both an optimal serial and an optimal parallel algorithm, that is, a serial algorithm with a complexity of O(N) and a parallel algorithm with a time complexity of O(logN) on a computer with O(N) processors. The novel feature of this algorithm is that it first calculates the interbody forces. Once these forces are known, the accelerations are easily calculated. We discuss the extension of the algorithm to the task of calculating the forward dynamics of a kinematic tree consisting of a single main chain plus any number of short side branches. We also show that this new factorization of M ^?1 leads to a new factorization of the operational-space inverse inertia, Λ ^?1, in the form of a product involving sparse matrices. We show that this factorization can be exploited for optimal serial and parallel computation of Λ ^?1, that is, a serial algorithm with a complexity of O(N) and a parallel algorithm with a time complexity of O(logN) on a computer with O(N) processors. 相似文献

2.

Decoding of low-density codes with parity-check matrices composed of permutation matrices in an erasure channel

D. K. Zigangirov K. Sh. Zigangirov 《Problems of Information Transmission》2006,42(2):106-113

A lower bound for the number of iteratively correctable erasures is given, with application to the ensemble of LDPC codes with parity-check matrices composed of permutation matrices [1]. We assume that the Zyablov-Pinsker iterative decoding algorithm [2] is used. Its complexity is O(N log N), where N is the block length. 相似文献

3.

A fast pessimistic one-step diagnosis algorithm for hypercube multicomputer systems

《Journal of Parallel and Distributed Computing》2004,64(4):546-553

This paper describes a system-level diagnosis algorithm for hypercube multicomputer systems. The algorithm is based on the PMC model and can isolate all faulty processors to within a set that contains at most one fault-free processor. If we denote by N the total number of processors in a hypercube system to be diagnosed, then, based on the judiciously designed data structures, the algorithm can run in O(Nlog₂N) time; whereas the best-known diagnosis algorithm, the YML algorithm, runs in O(N^2.5) time. Consequently, the new algorithm is remarkably superior to the YML algorithm in terms of the time cost. 相似文献

4.

A faster solution to the horizon problem

J. Dozier J. Bruno P. Downey 《Computers & Geosciences》1981,7(2):145-151

An algorithm is developed for calculating the horizons for each point in a digital terrain grid in order N iterations, whereas all previous methods seem to be of O(N²) time complexity. The new method makes horizon computations reasonable, and ought to improve the accuracy of surface climate models in rugged terrain. 相似文献

5.

An iterative solver for a mixed variable variational formulation of the (first) biharmonic problem

O. Axelsson I. Gustafsson 《Computer Methods in Applied Mechanics and Engineering》1979,20(1):9-16

The coupled system of equations resulting from a mixed variable formulation of the biharmonic problem is solved by a preconditioned conjugate gradient method. The preconditioning matrix is based on an incomplete factorization of a positive definite operator similar to the 13-point difference approximation of the biharmonic operator.The first iterate is already quite accurate even if the initial approximation is not. Hence, often a small number of iterations will suffice to get an accurate enough solution. For smaller iteration errors the number of iterations grows as O(h^?1), h → 0, where h is an average mesh-size parameter. 相似文献

6.

An improved algorithm for Boolean matrix multiplication

Dr. N. Santoro Dr. J. Urrutia 《Computing》1986,36(4):375-382

A new algorithm for computing the product of two arbitraryN×N Boolean matrices is presented. The algorithm requiresO (N ³/logN) bit operations and onlyO(N logN) bits of additional storage. This represents an improvement on the Four Russians' method which requires the same number of operations but usesO(N ³/logN) bits of additional storage. 相似文献

7.

An adaptive global reduction algorithm for wormhole-routed 2D meshes

《Parallel Computing》1997,23(13):1909-1936

This paper presents a global reduction algorithm for wormhole-routed 2D meshes. Well-known reduction algorithms that are optimized for short vectors have complexity O(M log N), where N = n × n is the number of nodes, and M the vector length. Algorithms suitable for long vectors have complexity O(√N + M). Previously known asymptotically optimal algorithms with complexity O(log N + M) incur inherent network contention among constituent messages. The proposed algorithm adapts to the given vector length, resulting in complexities O(M log N) for short vectors, O(log N + M) for medium-sized vectors, and O(√N + M) for sufficiently long vectors. The O(√N + M) version is preferred to the O(log N + M) version for long vectors, due to its small coefficient associated with M, the dominating factor for such vectors. The algorithm is contention-free in a synchronous environment. Under asynchronous execution models, depth contention (contention among message-passing steps) may occur. However, simulation studies show that the effect of depth contention on the actual performance is negligible. 相似文献

8.

A Unified Approach to the Parallel Construction of Search Trees

《Journal of Parallel and Distributed Computing》1995,27(1):71-78

We present a unified parallel algorithm for constructing various search trees. The tree construction is based on a unified scheme, called bottom-level balancing, which constructs a perfectly balanced search tree having a uniform distribution of keys. The algorithm takes O(log log N) time using N/log log N processors on the EREW PRAM model, and O(1) time with N processors on the CREW PRAM model, where N is the number of keys in the tree. 相似文献

9.

Interpolation by geometric algorithm

Takashi Maekawa^{Author Vitae} Yasunori Matsumoto Author VitaeAuthor Vitae 《Computer aided design》2007,39(4):313-323

We present a novel geometric algorithm to construct a smooth surface that interpolates a triangular or a quadrilateral mesh of arbitrary topological type formed by n vertices. Although our method can be applied to B-spline surfaces and subdivision surfaces of all kinds, we illustrate our algorithm focusing on Loop subdivision surfaces as most of the meshes are in triangular form. We start our algorithm by assuming that the given triangular mesh is a control net of a Loop subdivision surface. The control points are iteratively updated globally by a simple local point-surface distance computation and an offsetting procedure without solving a linear system. The complexity of our algorithm is O(mn) where n is the number of vertices and m is the number of iterations. The number of iterations m depends on the fineness of the mesh and accuracy required. 相似文献

10.

The efficiency of the alpha-beta search on trees with branch-dependent terminal node scores

M.M. Newborn 《Artificial Intelligence》1977,8(2):137-153

An analysis of the efficiency of the alpha-beta algorithm is carried out based on a probabilistic model in which terminal node scores depend on random branch values. Explicit expressions are derived for the expected number of terminal nodes scored for the cases of uniform trees of fanout N and of depths 2 and 3. For trees of depth 2, the expected number is of order O(NH_N); for trees of depth 3, the expected number is of order O(N²). An upper bound on the expected number of terminal nodes scored for trees of depth 4 is shown to be no greater than O(N²H_N²) and no less than O(N²). 相似文献

11.

Constant-Time Algorithm for the Euclidean Distance Transform on Reconfigurable Meshes

《Journal of Parallel and Distributed Computing》2001,61(10):1439-1455

The Euclidean distance transform (EDT) is an operation to convert a binary image consisting of black and white pixels to a representation where each pixel has the Euclidean distance of the nearest black pixel. The EDT has many applications in computer vision and image processing. In this paper, we present a constant-time algorithm for computing the EDT of an N×N image on a reconfigurable mesh. Our algorithm has two variants. (i) If the image is initially given in an N×N mesh, one pixel per processor, our algorithm requires an N×N×N mesh for computing the EDT. (ii) If the image is given in an N×N² mesh, each row of the image in the first row of a separate N×N mesh, we can compute the EDT in the same N×N² mesh. The AT² bounds for these two variants are O(N⁴) and O(N³) respectively. The best previously known algorithm (Y. Pan and K. Li, Inform. Sci.120 (1999), 209–221) for this problem assumes input similar to the second variant of our algorithm and runs in constant-time on an N²×N² reconfigurable mesh with an AT² bound of O(N⁴). Hence both variants of our algorithm improve upon the processor complexity of the algorithm in Pan and Li (1999) by a factor of N and the second variant improves upon the AT² complexity by a factor of N. 相似文献

12.

Packet Routing and PRAM Emulation on Star Graphs and Leveled Networks

《Journal of Parallel and Distributed Computing》1994,20(2):145-157

We consider the problem of permutation routing on a star graph, an interconnection network which has better properties than the hypercube. In particular, its degree and diameter are sublogarithmic in the network size. We present optimal randomized routing algorithms that run in O(D) steps (where D is the network diameter) for the worst-case input with high probability. We also show that for the n-way shuffle network with N = nⁿ nodes, there exists a randomized routing algorithm which runs in O(n) time with high probability. Another contribution of this paper is a universal randomized routing algorithm that could do optimal routing for a large class of networks (called leveled networks) which includes the star graph. The associative analysis is also network-independent. In addition, we present a deterministic routing algorithm, for the star graph, which is near optimal. All the algorithms we give are oblivious. As an application of our routing algorithms, we also show how to emulate a PRAM optimally on this class of networks. 相似文献

13.

A fast fault-identification algorithm for bijective connection graphs using the PMC model

Tseng-Kuei Li Chang-Hsiung Tsai Hong-Chun Hsu 《Information Sciences》2012

Diagnosis of reliability is an important topic for interconnection networks. Under the classical PMC model, Dahura and Masson [5] proposed a polynomial time algorithm with time complexity O(N^2.5) to identify all faulty nodes in an N-node network. This paper addresses the fault diagnosis of so called bijective connection (BC) graphs including hypercubes, twisted cubes, locally twisted cubes, crossed cubes, and Möbius cubes. Utilizing a helpful structure proposed by Hsu and Tan [20] that was called the extending star by Lin et al. [24], and noting the existence of a structured Hamiltonian path within any BC graph, we present a fast diagnostic algorithm to identify all faulty nodes in O(N) time, where N = 2ⁿ, n ? 4, stands for the total number of nodes in the n-dimensional BC graph. As a result, this algorithm is significantly superior to Dahura–Masson’s algorithm when applied to BC graphs. 相似文献

14.

PetRBF — A parallel O(N) algorithm for radial basis function interpolation with Gaussians

Rio Yokota L.A. Barba Matthew G. Knepley 《Computer Methods in Applied Mechanics and Engineering》2010,199(25-28):1793-1804

We have developed a parallel algorithm for radial basis function (rbf) interpolation that exhibits O(N) complexity, requires O(N) storage, and scales excellently up to a thousand processes. The algorithm uses a gmres iterative solver with a restricted additive Schwarz method (rasm) as a preconditioner and a fast matrix-vector algorithm. Previous fast rbf methods — achieving at most O(NlogN) complexity — were developed using multiquadric and polyharmonic basis functions. In contrast, the present method uses Gaussians with a small variance with respect to the domain, but with sufficient overlap. This is a common choice in particle methods for fluid simulation, our main target application. The fast decay of the Gaussian basis function allows rapid convergence of the iterative solver even when the subdomains in the rasm are very small. At the same time we show that the accuracy of the interpolation can achieve machine precision. The present method was implemented in parallel using the petsc library (developer version). Numerical experiments demonstrate its capability in problems of rbf interpolation with more than 50 million data points, timing at 106 s (19 iterations for an error tolerance of 10^? 15) on 1024 processors of a Blue Gene/L (700 MHz PowerPC processors). The parallel code is freely available in the open-source model. 相似文献

15.

Efficient region segmentation on compressed gray images using quadtree and shading representation

Kuo-Liang Chung Author Vitae Hsu-Lien Huang Author Vitae Author Vitae 《Pattern recognition》2004,37(8):1591-1605

Image segmentation, which partitions the image into homogeneous regions, is a fundamental operation in image processing. Suppose the input gray image with size N×N has been compressed into the compressed image via quadtree and shading representation. Assume that the number of blocks in the representation is B, commonly B<N² due to the compression effect. This paper first derives some closed forms for computing the mean/variance of any block and for calculating the two statistical measures of any merged region in O(1) time. It then presents an efficient O(Bα(B))-time algorithm for performing region segmentation on the compressed image directly where α(B) is the inverse of the Ackerman's function and is a very slowly growing function. With the same time complexity, our results extend the pioneering results by Dillencourt and Samet from the map image to the gray image. In addition, with four real images, experimental results show that our proposed algorithm has about 55.4% execution time improvement ratio when compared to the previous fastest region-segmentation algorithm by Fiorio and Gustedt whose O(N²)-time algorithm is run on the original N×N gray image. 相似文献

16.

Parallel Sorting Algorithm Using Multiway Merge and Its Implementation on a Multi-Mesh Network

《Journal of Parallel and Distributed Computing》2000,60(7):891-907

In this paper, we present a parallel sorting algorithm using the technique of multi-way merge. This algorithm, when implemented on a t dimensional mesh having n^t nodes (t>2), sorts n^t elements in O((t²−3t+2) n) time, thus offering a better order of time complexity than the [((t²−t) n log n)/2+O(nt)]-time algorithm of P. F. Corbett and I. D. Scherson (1992, IEEE Trans. Parallel Distrib. Systems3, 626–632). Further, the proposed algorithm can also be implemented on a Multi-Mesh network (1999, D. Das, M. De, and B. P. Sinha, IEEE Trans. Comput.48, 536–551) to sort N elements in 54N^1/4+o(N^1/4) steps, which shows an improvement over 58N^1/4+o(N^1/4) steps needed by the algorithm in (1997, M. De, D. Das, M. Ghosh, and P. B. Sinha, IEEE Trans. Comput.46, 1132–1137). 相似文献

17.

A probabilistic inference scheme for hierarchical buggy models

《International journal of man-machine studies》1993,38(5):857-872

The probabilistic reasoning scheme of Dempster-Shafer theory provides a remarkably efficient bug identification algorithm for a hierarchical Buggy model. In the particular Buggy model generated by the repair theory of Brown & Van Lehn (1980, A generative theory of bugs in procedural skills, Cognitive Science, 2, 155-192), both Shafer & Logan (1987, Implementing Dempster's rule for hierarchical evidence, Artificial Intelligence, 33 , 271-298) and Gordon & Shortliffe (1985, A method for managing evidential reasoning in a hierarchical hypothesis space, Artificial Intelligence, 26, 324-357) schemes have provided almost identical computational accuracy although the latter involves an approximation of a "smallest superset". If n denotes the number of bugs to be identified, the computational complexity of the two schemes, originally of O (n^4/3) and O(n²) respectively, can be improved to O(n) using the simplified top-down calculation scheme whereby from among all the nodes we first locate the particular "parental" node to which the bug belongs and then the bug itself among the sibs within the node. On average, about 5-7 problems are adequate to raise the belief function of the bug to 95% level, based on the evidential reasoning schemes. 相似文献

18.

Solving single-product economic lot-sizing problem with non-increasing setup cost,constant capacity and convex inventory cost in O(N log N) time

Yi Feng Shaoxiang Chen Arun Kumar Bing Lin 《Computers & Operations Research》2011

This paper considers an economic lot sizing model with constant capacity, non-increasing setup cost, and convex inventory cost function. Algorithms with computational time of O(N×TD_N)have been developed for solving the model, where N is the number of planning periods and TD_N is the total demand. This study partially characterizes the optimal planning structure of the model. A new efficient algorithm with computational time of O(N log N) has also been developed based on the partial optimal structure. Moreover, computational study demonstrates that the new algorithm is efficient. 相似文献

19.

A fast pessimistic diagnosis algorithm for generalized hypercube multicomputer systems

Dyi-Rong Duh Chien-Hong Chen Keh-Ning Chang 《The Journal of supercomputing》2012,61(3):605-618

The reliability of processors is an important issue for designing a massively parallel processing system for which fault-tolerant computing is crucial. In order to achieve high system reliability and availability, a faulty processor (node) when found should be replaced by a fault-free processor. Within a multiprocessor system, the technique of identifying faulty nodes by constructing tests on the nodes and interpreting the test outcomes is known as system-level diagnosis. The topological structure of a multicomputer system can be modeled by a graph of which the vertices and edges correspond to nodes and links of the system, respectively. This work presents a system-level diagnosis algorithm for a generalized hypercube which is an attractive variance of a hypercube. The proposed algorithm is based on the PMC model and can isolate all faulty nodes to within a set which contains at most one fault-free node. If the total number of nodes to be diagnosed in a generalized hypercube is N, the proposed algorithm can run in O(Nlog?N) time, and being superior to Yang??s algorithm proposed in 2004, it can diagnose not only a hypercube but also a generalized hypercube. 相似文献

20.

Computing the Map of Geometric Minimal Cuts

Jinhui Xu Lei Xu Evanthia Papadopoulou 《Algorithmica》2014,68(4):805-834

In this paper we consider the following problem of computing a map of geometric minimal cuts (called MGMC problem): Given a graph G=(V,E) and a planar rectilinear embedding of a subgraph H=(V _H,E _H) of G, compute the map of geometric minimal cuts induced by axis-aligned rectangles in the embedding plane. The MGMC problem is motivated by the critical area extraction problem in VLSI designs and finds applications in several other fields. In this paper, we propose a novel approach based on a mix of geometric and graph algorithm techniques for the MGMC problem. Our approach first shows that unlike the classic min-cut problem on graphs, the number of all rectilinear geometric minimal cuts is bounded by a low polynomial, O(n ³). Our algorithm for identifying geometric minimal cuts runs in O(n ³logn(loglogn)³) expected time which can be reduced to O(nlogn(loglogn)³) when the maximum size of the cut is bounded by a constant, where n=|V _H|. Once geometric minimal cuts are identified we show that the problem can be reduced to computing the L _∞ Hausdorff Voronoi diagram of axis aligned rectangles. We present the first output-sensitive algorithm to compute this diagram which runs in O((N+K)log² NloglogN) time and O(Nlog² N) space, where N is the number of rectangles and K is the complexity of the Hausdorff Voronoi diagram. Our approach settles several open problems regarding the MGMC problem. 相似文献