期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Fast parallel Preconditioned Conjugate Gradient algorithms for robot manipulator dynamics simulation

Amir Fijany Robert E. Scheid 《Journal of Intelligent and Robotic Systems》1994,9(1-2):73-99

In this paper fast parallel Preconditioned Conjugate Gradient (PCG) algorithms for robot manipulator forward dynamics, or dynamic simulation, problem are presented. By exploiting the inherent structure of the forward dynamics problem, suitable preconditioners are devised to accelerate the iterations. Also, based on the choice of preconditioners, a modified dynamic formulation is used to speedup both serial and parallel computation of each iteration. The implementation of the parallel algorithms on two interconnected processor arrays is discussed and their computation and communication complexities are analyzed. The simulation results for a Puma Arm are presented to illustrate the effectiveness of the proposed preconditioners. With a faster convergence due to preconditioning and a faster computation of iterations due to parallelization, the developed parallel PCG algorithms represent the fastest alternative for parallel computation of the problem withO(n) processors. 相似文献

2.

Fast implementation of dense stereo vision algorithms on a highly parallel SIMD architecture

Fouzhan Hosseini Amir Fijany Saeed Safari Jean-Guy Fontaine 《Journal of Real-Time Image Processing》2013,8(4):421-435

In this paper, we present faster than real-time implementation of a class of dense stereo vision algorithms on a low-power massively parallel SIMD architecture, the CSX700. With two cores, each with 96 Processing Elements, this SIMD architecture provides a peak computation power of 96 GFLOPS while consuming only 9 Watts, making it an excellent candidate for embedded computing applications. Exploiting full features of this architecture, we have developed schemes for an efficient parallel implementation with minimum of overhead. For the sum of squared differences (SSD) algorithm and for VGA (640 × 480) images with disparity ranges of 16 and 32, we achieve a performance of 179 and 94 frames per second (fps), respectively. For the HDTV (1,280 × 720) images with disparity ranges of 16 and 32, we achieve a performance of 67 and 35 fps, respectively. We have also implemented more accurate, and hence more computationally expensive variants of the SSD, and for most cases, particularly for VGA images, we have achieved faster than real-time performance. Our results clearly demonstrate that, by developing careful parallelization schemes, the CSX architecture can provide excellent performance and flexibility for various embedded vision applications. 相似文献

3.

A new factorization of the mass matrix for optimal serial and parallel calculation of multibody dynamics

Amir Fijany Roy Featherstone 《Multibody System Dynamics》2013,29(2):169-187

This paper describes a new factorization of the inverse of the joint-space inertia matrix M. In this factorization, M ^?1 is directly obtained as the product of a set of sparse matrices wherein, for a serial chain, only the inversion of a block-tridiagonal matrix is needed. In other words, this factorization reduces the inversion of a dense matrix to that of a block-tridiagonal one. As a result, this factorization leads to both an optimal serial and an optimal parallel algorithm, that is, a serial algorithm with a complexity of O(N) and a parallel algorithm with a time complexity of O(logN) on a computer with O(N) processors. The novel feature of this algorithm is that it first calculates the interbody forces. Once these forces are known, the accelerations are easily calculated. We discuss the extension of the algorithm to the task of calculating the forward dynamics of a kinematic tree consisting of a single main chain plus any number of short side branches. We also show that this new factorization of M ^?1 leads to a new factorization of the operational-space inverse inertia, Λ ^?1, in the form of a product involving sparse matrices. We show that this factorization can be exploited for optimal serial and parallel computation of Λ ^?1, that is, a serial algorithm with a complexity of O(N) and a parallel algorithm with a time complexity of O(logN) on a computer with O(N) processors. 相似文献

4.

Time-parallel solution of linear partial differential equations on the Intel Touchstone Delta supercomputer

Nikzad Toomarian Amir Fijany Jacob Barmen 《Concurrency and Computation》1994,6(8):641-652

The paper presents the implementation of a new class of massively parallel algorithms for solving certain time-dependent partial differential equations (PDEs) on massively parallel supercomputers. Such PDEs are usually solved numerically, by discretization in time and space, and by applying a time-stepping procedure to data and algorithms potentially parallelized in the spatial domain. In a radical departure from such a strictly sequential temporal paradigm, we have developed a concept of time-parallel algorithms, which allows the marching in time to be fully parallelized. This is achieved by using a set of transformations based on eigenvalue-eigenvector decomposition of the matrices involved in the discrete formalism. Our time-parallel algorithms possess a highly decoupled structure, and can therefore be efficiently implemented on emerging, massively parallel, high-performance supercomputers, with a minimum of communication and synchronization overhead. We have successfully carried out a proof-of-concept demonstration of the basic ideas using a two-dimensional heat equation example implemented on the Intel Touchstone Delta supercomputer. Our results indicate that linear, and even superlinear, speed-up can be achieved and maintained for a very large number of processor nodes. 相似文献

5.

An efficient algorithm for computation of manipulator inertia matrix

Amir Fijany Antal K. Bejczy 《野外机器人技术杂志》1990,7(1):57-80

In this article an efficient algorithm for computation of the manipulator inertia matrix is presented. The algorithm is derived based on Newton's and Euler's laws governing the motion of rigid bodies. Using spatial notations, the algorithm leads to the definition of the composite rigid-body spatial inertia which is a spatial representation of the notion of augmented body. The equations resulting from this algorithm are derived in a coordinate-free form. The choice of the coordinate frame for projection of the coordinate-free equations, that is, the intrinsic equations, is discussed by analyzing the vectors and the tensors involved in the final equations. Previously proposed algorithms, the physical interpretations leading to their derivation, and the redundancy in their computations are analyzed. The developed algorithm achieves a greater efficiency by eliminating the redundancy in the intrinsic equations as well as by a suitable choice of coordinate frame for projection of the intrinsic equations. 相似文献

6.

A massively parallel computation strategy for FDTD: time and spaceparallelism applied to electromagnetics problems

Fijany A. Jensen M.A. Rahmat-Samii Y. Barhen J. 《Antennas and Propagation, IEEE Transactions on》1995,43(12):1441-1449

We present a novel strategy for incorporating massive parallelism into the solution of Maxwell's equations using finite-difference time-domain methods. In a departure from previous techniques wherein spatial parallelism is used, our approach exploits massive temporal parallelism by computing all of the time steps in parallel. Furthermore, in contrast to other methods which appear to concentrate on explicit schemes such as Yee's (1966) algorithm, our strategy uses the implicit Crank-Nicolson technique which provides superior numerical properties. We show that the use of temporal parallelism results in algorithms which offer a massive degree of coarse grain parallelism with minimum communication and synchronization requirements. Due to these features, the time-parallel algorithms are particularly suitable for implementation on emerging massively parallel multiple instruction-multiple data (MIMD) architectures. The methodology is applied to a circular cylindrical configuration, which serves as a testbed problem for the approach, to demonstrate the massive parallelism that can be exploited. We also discuss the generalization of the methodology for more complex problems 相似文献

7.

Parallel computation of manipulator inverse dynamics

Amir Fijany Antal K. Bejczy 《野外机器人技术杂志》1991,8(5):599-635

In this article, parallel computation of manipulator inverse dynamics is investigated. A hierarchical graph-based mapping approach is devised to analyze the inherent parallelism in the Newton-Euler formulation at several computational levels, and to derive the features of an abstract architecture for exploitation of parallelism. At each level, a parallel algorithm represents the application of a parallel model of computation that transforms the computation into a graph whose structure defines the features of an abstract architecture, i.e., number of processors, communication structure, etc. Data flow analysis is employed to derive the time lower bound in the computation as well as the sequencing of the abstract architecture. The features of the target architecture are defined by optimization of the abstract architecture to exploit maximum parallelism while minimizing various overheads and architectural complexity. An algorithmically specialized, highly parallel, MIMD-SIMD architecture is designed and implemented that is capable of efficient exploitation of parallelism at several computational levels. The computation time of the Newton-Euler formulation for a 6-degree-of-freedom (dof) general manipulator is measured as 187 μs. The increase in computation time for each additional dof is 23 μs, which leads to a computation time of less than 500 μs, even for a 12-dof redundant arm. 相似文献