期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Parallel B-Spline Surface Interpolation on a Mesh-Connected Processor Array

Cheng F. H. Wasilkowski G. W. Wang J. Y. Zhang C. M. Wang W. P. 《Journal of Parallel and Distributed Computing》1995,24(2)

A parallel implementation of Chebyshev method is presented for the B-spline surface interpolation problem. The algorithm finds the control points of a uniform bicubic B-spline surface that interpolates m × n data points on an m × n mesh-connected processor array in constant time. Hence it is optimal. Due to its numerical stability, the algorithm can successfully be used in finite precision floating-point arithmetic. 相似文献

2.

Recursive Algorithms of Parallel Matrix Computations

E. G. Sukhov 《Automation and Remote Control》2001,62(11):1924-1929

Consideration was given to the recursive approach to the block algorithms of linear algebra. The problem of LL _T-decomposition (quadratic root) was used by way of example. Computational complexity was estimated both in terms of arithmetic floating-point operations and data-transfer operations required to generate recursive structures. The main area of application of the algorithms is solution of large-scale problems on parallel and distributed computer systems. 相似文献

3.

Coding Techniques for Fault-Tolerant Parallel Prefix Computations in Abelian Groups 总被引：1，自引：0，他引：1

Hadjicostis Christoforos N. 《Computer Journal》2004,47(3):329-341

相似文献

4.

Parallel B-Spline Surface Fitting on Mesh-Connected Computers

Kuo-Liang Chung Wen-Ming Yan 《Journal of Parallel and Distributed Computing》1996,35(2):205

The solution of uniform bicubic B-spline curve/surface fitting problem is considered. Based on the matrix perturbation method, this paper first presents a novel approximateO(n/p)-time parallel B-spline curve fitting algorithm for finding the correspondingncontrol points that interpolate thosendata points on a linear array processor withpprocessors, wherep≤n. Givenm×ndata points, we then present anO(mn/(p₁p₂))-time parallel algorithm for solving the uniform bicubic B-spline surface fitting problem on ap₁×p₂mesh-connected computer, wherep₁≤mandp₂≤n. The relative error analyses of our two stable and cost-optimal parallel solvers are also given. When settingp₁=mandp₂=n, a constant-time parallel solver for B-spline surface fitting can be derived; this time- and cost-optimal result is a direct method, in contrast to the parallel iterative method of Chenget al.(Parallel B-spline surface interpolation on a mesh-connected processor array,J. Parallel Distrib. Comput.24, 2 (1995), 224–229). 相似文献

5.

Recursive Least-Squares Problems on Distributed-Memory Multiprocessors

《Journal of Parallel and Distributed Computing》1995,24(1):11-26

We discuss implementations of block algorithms for recursive least squares (RLS for short) problems on ring distributed-memory multiprocessors. We consider the sliding rectangular window case which involves triangularization followed by updating and downdating of the data matrix. We compare several schemes for computing the current least-squares solution, including a direct back-substitution scheme and a scheme where the previous solution vector is updated to the current solution vector by adding the so-called Kalman gain vector. The techniques are implemented on a linear array of transputers and on the Intel iPSC/2 hypercube, and evaluated with respect to their execution time and numerical accuracy. 相似文献

6.

Multisearch Techniques: Parallel Data Structures on Mesh-Connected Computers

《Journal of Parallel and Distributed Computing》1994,20(1):1-13

The multisearch problem is defined as follows. Given a data structure D modeled as a graph with n constant-degree nodes, perform O(n) searches on D. Let r be the length of the longest search path associated with a search process, and assume that the paths are determined "on-line." That is, the search paths may overlap arbitrarily. In this paper, we solve the multisearch problem for certain classes of graphs in O([formula] + r ([formula]/log n)) time on a [formula] × [formula]n mesh-connected computer. For many data structures, the search path traversed when answering one search query has length r = O(log n). For these cases, our algorithm processes O(n) such queries in asymptotically optimal Θ([formula]) time. The classes of graphs we consider contain many of the important data structures that arise in practice, ranging from simple trees to Kirkpatrick hierarchical search DAGs. Multisearch is a useful abstraction that can be used to implement parallel versions of standard sequential data structures on a mesh. As example applications, we consider a variety of parallel on-line tree traversals, as well as hierarchical representations of polyhedra and its myriad of applications (line-polyhedron intersection queries, multiple tangent plane determination, intersecting convex polyhedra, and three-dimensional convex hull). 相似文献

7.

Embedding Hypercubes and Related Networks into Mesh-Connected Processor Arrays

《Journal of Parallel and Distributed Computing》1994,23(1):72-79

In this paper we study the problem of how computations programmed for hypercubes, and their bounded-degree relatives, the shuffle-exchange and cube-connected-cycles, can be efficiently emulated by mesh-connected arrays of processing elements. The emulations we present are implemented via graph embeddings. The graph embeddings we present are all optimal with respect to congestion, and are also noteworthy for they yield efficient emulation programs that are written in a SIMD style without the use of indirect addressing. The paper includes the three following emulation results, where each emulation algorithm requires no indirection. For any n ≥ 1 and fixed r ≥ 1, an n-sided r-dimensional mesh can emulate an n2ⁿ-node cube-connected-cycles network with slowdown Θ(2ⁿ/n^r−1), a 2ⁿ-node shuffle-exchange network with slowdown Θ(2ⁿ/n^r), and a 2ⁿ-node hypercube network operating in a weak model, with slowdown Θ(2ⁿ log n/n^r). 相似文献

8.

A Parallel Algorithm for Volume Projections on SIMD Mesh-Connected Computers

Li Hongzheng Shi Hongchi 《The Journal of supercomputing》2000,17(1):5-22

Projections are widely used in machine vision, volume rendering, and computer graphics. For applications with 3D volume data, we design a parallel projection algorithm on SIMD mesh-connected computers and implement the algorithm on the Parallel Algebraic Logic (PAL) computer. The algorithm is a parallel ray casting algorithm for both orthographic and perspective projections. It decomposes a volume projection into two transformations that can be implemented in the SIMD fashion to solve the data distribution and redistribution problem caused by non-regular data access patterns in volume projections. 相似文献

9.

Hardware-Efficient and Highly Reconfigurable 4- and 2-Track Fault-Tolerant Designs for Mesh-Connected Arrays

《Journal of Parallel and Distributed Computing》2001,61(10):1391-1411

We consider m-track models for constructing fault-tolerant (FT) mesh systems which have one primary and m spare tracks per row and column, switches at the intersection of these tracks, and spare processors at the boundaries. A faulty system is reconfigured by finding for each fault u a reconfiguration path from the fault to a spare in which, starting from the fault u, a processor is replaced or “covered” by the nearest “available” succeeding processor on the path—a processor on the path is not available if it is faulty or is used as a “cover” on some other reconfiguration path. In previous work, a 1-track design that can support any set of node-disjoint straight reconfiguration paths, and a more reliable 3-track design that can support any set of node-disjoint rectilinear reconfiguration paths have been proposed. In this research note, we present: (1) A fundamental result regarding the universality of simple “one-to-one switches” in m-track 2-D mesh designs in terms of their reconfigurabilities. (2) A 4-track mesh design that can support any set of edge-disjoint (a much less restrictive criterion than node-disjointness) rectilinear reconfiguration paths, and that has 34% less switching overhead and significantly higher, actually close-to-optimal, reconfigurability compared to the previously proposed 3-track design. (3) A new 2-track design derived from the above 4-track design that we show can support the same set of reconfiguration paths as the previous 3-track design but with 33% less wiring overhead. (4) Results on the deterministic fault tolerance capabilities (the number of faults guaranteed reconfigurable) of our 4- and 2-track designs, and the previously proposed 1- and 3-track designs. 相似文献

10.

基于最小二乘法的跑道摩擦系数测试方法

史艳红王立文《微计算机信息》2009,25(23)

介绍了车载跑道摩擦系数测试设备的测量原理和嵌入式PC控制系统组成,分析了测量过程中跑道摩擦系数测量载体-测量轮所受垂直正压力和水平摩擦力产生波动的影响因素.为消除水平力、垂直力波动对跑道摩擦系数测量结果带来的误差,采用了渐消记忆递推最小二乘数据处理方法.结果表明该方法可消除测量过程中大部分中高频干扰信号,使单次测量数据精度提高,保证摩擦系数测量结果的准确性. 相似文献

11.

Robust Parallel Computations through Randomization

S. C. Kontogiannis G. E. Pantziou P. G. Spirakis M. Yung 《Theory of Computing Systems》2000,33(5-6):427-464

In this paper we present an efficient general simulation strategy for computations designed for fully operational bsp machines of n ideal processors, on n -processor dynamic-fault-prone bsp machines. The fault occurrences are fail-stop and fully dynamic, i.e., they are allowed to happen on-line at any point of the computation, subject to the constraint that the total number of faulty processors may never exceed a known fraction. The computational paradigm can be exploited for robust computations over virtual parallel settings with a volatile underlying infrastructure, such as a network of workstations (where workstations may be taken out of the virtual parallel machine by their owner). Our simulation strategy is Las Vegas (i.e., it may never fail, due to backtracking operations to robustly stored instances of the computation, in case of locally unrecoverable situations). It adopts an adaptive balancing scheme of the workload among the currently live processors of the bsp machine. Our strategy is efficient in the sense that, compared with an optimal off-line adversarial computation under the same sequence of fault occurrences, it achieves an \cal O \left( (log n ⋅log log n) ² \right) multiplicative factor times the optimal work (namely, this measure is in the sense of the ``competitive ratio' of on-line analysis). In addition, our scheme is modular, integrated, and considers many implementation points. We comment that, to our knowledge, no previous work on robust parallel computations has considered fully dynamic faults in the bsp model, or in general distributed memory systems. Furthermore, this is the first time an efficient Las Vegas simulation in this area is achieved. Online publication October 26, 2000. 相似文献

12.

一种检测程序控制流故障的方法

李红兵尚利宏周密金惠华《微计算机信息》2010,(16)

本文介绍一种在容错处理器中实现控制流故障检测的方法。处理器的容错机制是通过修改超标量体系结构,利用时间冗余技术实现的。处理器支持两个指令流并发执行,本文提出的控制流检测算法是通过比较两个时间冗余的指令流的执行结果实现的,与同类实现方案相比,此方法可以进一步节省硬件资源以及额外的处理器执行时间。相似文献

13.

Fault-Tolerant Parallel Scheduling of Tasks on a Heterogeneous High-Performance Workstation Cluster 总被引：1，自引：1，他引：0

Kwok Yu-Kwong 《The Journal of supercomputing》2001,19(3):299-314

We propose a new approach, called cluster-based search (CBS), for scheduling large task graphs in parallel on a heterogeneous cluster of workstations connected by a high-speed network (e.g., using an ATM switch at OC-3 speed). The CBS algorithm uses a parallel random neighborhood search which works by refining multiple different initial schedules simultaneously using different workstations. The workstations communicate periodically to exchange their best solutions found thus far in order to direct the search to more promising regions in the search space. Heterogeneity of machines is exploited by the biased partitioning of the search space. The parallel random neighborhood search is fault-tolerant in that the workload of a failed workstation is automatically redistributed to other workstations so that the search can continue. We have implemented the CBS algorithm as a core function of our on-going development of SSI middleware for a Sun workstation cluster. 相似文献

14.

探秘MPP数据库

《互联网天地》2015,(7)

相似文献

15.

FTPA: Supporting Fault-Tolerant Parallel Computing through Parallel Recomputing

Yang Xuejun Du Yunfei Wang Panfeng Fu Hongyi Jia Jia 《Parallel and Distributed Systems, IEEE Transactions on》2009,20(10):1471-1486

As the size of large-scale computer systems increases, their mean-time-between-failures are becoming significantly shorter than the execution time of many current scientific applications. To complete the execution of scientific applications, they must tolerate hardware failures. Conventional rollback-recovery protocols redo the computation of the crashed process since the last checkpoint on a single processor. As a result, the recovery time of all protocols is no less than the time between the last checkpoint and the crash. In this paper, we propose a new application-level fault-tolerant approach for parallel applications called the Fault-Tolerant Parallel Algorithm (FTPA), which provides fast self-recovery. When fail-stop failures occur and are detected, all surviving processes recompute the workload of failed processes in parallel. FTPA, however, requires the user to be involved in fault tolerance. In order to ease the FTPA implementation, we developed Get it Fault-Tolerant (GiFT), a source-to-source precompiler tool to automate the FTPA implementation. We evaluate the performance of FTPA with parallel matrix multiplication and five kernels of NAS Parallel Benchmarks on a cluster system with 1,024 CPUs. The experimental results show that the performance of FTPA is better than the performance of the traditional checkpointing approach. 相似文献

16.

Parallel Matrix Computations Using a Reconfigurable Pipelined Optical Bus

《Journal of Parallel and Distributed Computing》1999,59(1):13-30

We present fast and cost-efficient parallel algorithms for a number of important and fundamental matrix computation problems on linear arrays with reconfigurable pipelined optical bus systems. These problems include computing the inverse, the characteristic polynomial, the determinant, the rank, the Nth power, and an LU- and a QR-factorization of a matrix and solving linear systems of equations. Our algorithms provide a wide range of performance–cost combinations. Compared with known results, the running time of parallel solutions to all these problems can be reduced by a factor of O(log N) while costs are maintained under o(N⁴). 相似文献

17.

Parallel Execution of Iterative Computations on Workstation Clusters

X. Wang E.K. Blum 《Journal of Parallel and Distributed Computing》1996,34(2):218

Methods are developed for transforming sequential programs for iterative computations into parallel-distributed versions which execute in parallel on a cluster of workstation or PC nodes on a local area network. We focus on communication issues and present algorithms for interprocess communication implemented by UNIX TCP/IP socket commands. Results of performance tests on several application problems, such as simulation of neural networks and the Jacobi method for solving linear equations, representative of a large class of application problems are presented. Analysis indicates that, for problems with rather intensive computation, speedups of better than 2p/3 are possible with an optimal numberpof nodes on a single Ethernet bus segment. Preliminary tests on small clusters show efficient speedups even for nonoptimalp. 相似文献

18.

User-Friendly Parallel Computations with Econometric Examples

Michael?Creel Email author 《Computational Economics》2005,26(2):107-128

This paper shows how a high-level matrix programming language may be used to perform Monte Carlo simulation, bootstrapping, estimation by maximum likelihood and GMM, and kernel regression in parallel on symmetric multiprocessor computers or clusters of workstations. The implementation of parallelization is done in a way such that an investigator may use the programs without any knowledge of parallel programming. A bootable CD that allows rapid creation of a cluster for parallel computing is introduced. Examples show that parallelization can lead to important reductions in computational time. Detailed discussion of how the Monte Carlo problem was parallelized is included as an example for learning to write parallel programs for Octave. JEL Classifications: C13; C14; C15; C63; C87 相似文献

19.

Processor Efficient Parallel Matching

Piotr Sankowski 《Theory of Computing Systems》2008,42(1):73-90

In this paper we consider the problem of finding perfect matchings in parallel. We present a RNC algorithm with almost optimal work with respect to sequential algorithms, i.e., it uses O(n ^ω) processors, where ω is the matrix multiplication exponent. Our algorithm is based on an RNC algorithm for computing determinant of a degree one polynomial matrix which is of independent interest. Research supported by KBN grant 1P03A01830. 相似文献

20.

基于多态并行处理器的生物计算并行实现

刘玉荣李涛《计算机技术与发展》2014,(8):55-58

针对传统的生物计算中DNA序列保守序列的识别（模体识别）和最长公共子序列计算需要较大的数据量、计算量,以及功耗大等问题,文中提出了两种基于PAAG多态并行处理器的并行算法,该并行处理器能够支持数据、线程、指令多种并行。通过编程在PAAG多态并行处理的处理单元（ PE）上开发了相应的串行和并行程序,将计算的不同过程分派到不同的处理单元（ PE）上进行处理,实现了不同粒度算法的并行。实验结果表明,文中提出的并行算法使模体识别和最长公共子序列的计算效率得到明显提高。相似文献