期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Optimum structural design with parallel finite element analysis 总被引：3，自引：0，他引：3

M.E.M. El-Sayed C.-K. Hsiung 《Computers & Structures》1991,40(6):1469-1474

Structural analysis is an important part of the optimum structural design process. Therefore, extra effort should be devoted to make this part as efficient as possible. Since finite element analysis is the most powerful and widely used tool in the structural analysis field, in this paper a new method for structural optimization by parallel finite element method is presented. This method divides the original structure into several substructures and assigns each substructure to one processor. Each processor handles its finite element calculation independently with limited communication between processors. Some numerical examples on the Cray X-MP multiprocessor system with their obtained speedups are presented. 相似文献

2.

HiMAP: a portable super modular multilevel parallel multidisciplinary process for large scale analysis

《Advances in Engineering Software》2000,31(8-9):617-620

相似文献

3.

Optimizing computing costs using divisible load analysis

Jeeho Sohn Robertazzi T.G. Luryi S. 《Parallel and Distributed Systems, IEEE Transactions on》1998,9(3):225-234

A bus oriented network where there is a charge for the amount of divisible load processed on each processor is investigated. A cost optimal processor sequencing result is found which involves assigning load to processors in nondecreasing order of the cost per load characteristic of each processor. More generally, one can trade cost against solution time. Algorithms are presented to minimize computing cost with an upper bound on solution time and to minimize solution time with an upper bound on cost. As an example of the use of this type of analysis, the effect of replacing one fast but expensive processor with a number of cheap but slow processors is also discussed. The type of questions investigated here are important for future computer utilities that perform distributed computation for some charge 相似文献

4.

局域网上求解线性方程组的一种并行Gauss-Seidel迭代算法

尚月强《计算机应用与软件》2008,25(9)

针对网络并行环境的计算能力强而通信相对较慢的实际情况,给出了一种局域网上求解线性方程组的并行Gauss-Seidel迭代算法.该算法将线性方程组的系数矩阵及右端项按行分块,然后将分块的系数矩阵及右端项按卷帘方式存储在各处理机,每次迭代通过循环传送已求出的部分解分量以减少处理机间的通信开销,提高并行算法的效率.试验结果表明该算法具有较高的并行效率和加速比. 相似文献

5.

High Performance Computations for Large Scale Simulations of Subsurface Multiphase Fluid and Heat Flow

Elmroth Erik Ding Chris Wu Yu-Shu 《The Journal of supercomputing》2001,18(3):235-258

TOUGH2 is a widely used reservoir simulator for solving subsurface flow related problems such as nuclear waste geologic isolation, environmental remediation of soil and groundwater contamination, and geothermal reservoir engineering. It solves a set of coupled mass and energy balance equations using a finite volume method. This contribution presents the design and analysis of a parallel version of TOUGH2. The parallel implementation first partitions the unstructured computational domain. For each time step, a set of coupled non-linear equations is solved with Newton iteration. In each Newton step, a Jacobian matrix is calculated and an ill-conditioned non-symmetric linear system is solved using a preconditioned iterative solver. Communication is required for convergence tests and data exchange across partitioning borders. Parallel performance results on Cray T3E-900 are presented for two real application problems arising in the Yucca Mountain nuclear waste site study. The execution time is reduced from 7504 seconds on two processors to 126 seconds on 128 processors for a 2D problem involving 52,752 equations. For a larger 3D problem with 293,928 equations the time decreases from 10,055 seconds on 16 processors to 329 seconds on 512 processors. 相似文献

6.

Multiprocessor implementation of digital filtering algorithms usinga parallel block processing method

Sung W. Mitra S.K. Jeren B. 《Parallel and Distributed Systems, IEEE Transactions on》1992,3(1):110-120

An efficient real-time implementation of digital filtering algorithms using a multiprocessor system in a ring network is investigated. This method is based on a parallel block processing approach, where a continuously supplied input data is divided into blocks, and the blocks are processed concurrently by being assigned to each processor in the system. This approach requires only a simple interconnection network and reduces significantly the number of communications among the processors, making the system easily expandable and highly efficient. In addition, various digital signal processing algorithms can be implemented on the same multiprocessor system. The data dependency of the blocks to be processed concurrently brings on dependency problems between the processors. A systematic scheduling method has been developed by using a precedence graph for the analysis of the dependency relation. Methods for solving the dependency problems between the processors are also investigated. Implementation procedures and results for FIR, recursive, and adaptive filtering algorithms are illustrated 相似文献

7.

Task allocation model for distributed systems

G. SAGAR ANIL K. SARJE 《International journal of systems science》2013,44(9):1671-1678

The problem of distributing tasks to processors in a distributed computing system is addressed. A task should be assigned to a processor whose capabilities are most appropriate for the execution of that task and excessive interprocessor communication is avoided. A simple algorithm for task allocation is presented. The execution costs and communication costs of the tasks are represented by arrays. A task is either assigned to a processor or fused with another task using a simple criterion. The execution and communication costs are then modified suitably. The process continues until all the tasks are assigned to processors. This algorithm also facilitates incorporation of various system constraints. It is applicable to random program structures and to a system containing any number of processors. 相似文献

8.

An improved heuristic for minimizing makespan among m identical parallel processors

John H. Blackstone Jr. Don T. Phillips 《Computers & Industrial Engineering》1981,5(4):279-287

A simple heuristic for minimizing makespan among identical processors assigned independent tasks is presented and explored. The heuristic initially assigns jobs to processors by applying a quick and effective algorithm which at present is commonly applied to this problem. The heuristic then seeks to identify pairs of jobs that may be interchanged between processors to improve the solution. The conditions under which an optimal makespan may be achieved by such an interchange are derived for the case of two processors. The procedure is then extended to three or more processors. Results of 700 randomly generated problems are reported. The heuristic achieved an optimal solution for most of the problems. The worst case performance for the heuristic has not been established; however, evidence is presented that the worst case for the heuristic is considerably smaller than that of algorithms presently used. 相似文献

9.

On nonlinear finite element analysis in single-, multi- and parallel-processors

Senol Utku Robert Melosh Munirul Islam Moktar Salama 《Computers & Structures》1982,15(1):39-47

Numerical solution of nonlinear equilibrium problems of structures by means of Newton-Raphson type iterations is reviewed. Each step of the iteration is shown to correspond to the solution of a linear problem, therefore the feasibility of the finite element method for nonlinear analysis is established. Organization and flow of data for various types of digital computers, such as single-processor/single-level memory, single-processor/two-level-memory, vector-processor/two-level-memory, and parallel-processors, with and without sub-structuring (i.e. partitioning) are given. The effect of the relative costs of computation, memory and data transfer on substructuring is shown. The idea of assigning comparable size substructures to parallel processors is exploited. Under Cholesky type factorization schemes, the efficiency of parallel processing is shown to decrease due to the occasional shared data, just as that due to the shared facilities. 相似文献

10.

A hybrid message passing/shared memory parallelization of the adaptive integral method for multi-core clusters

Fangzhou Wei Ali E. Yilmaz 《Parallel Computing》2011,37(6-7):279-301

A hybrid message passing and shared memory parallelization technique is presented for improving the scalability of the adaptive integral method (AIM), an FFT based algorithm, on clusters of identical multi-core processors. The proposed hybrid MPI/OpenMP parallelization scheme is based on a nested one-dimensional (1-D) slab decomposition of the 3-D auxiliary regular grid and the associated AIM calculations: If there are M processors and T cores per processor, the scheme (i) divides the regular grid into M slabs and MT sub-slabs, (ii) assigns each slab/sub-slab and the associated operations to one of the processors/cores, and (iii) uses MPI for inter-processor data communication and OpenMP for intra-processor data exchange. The MPI/OpenMP parallel AIM is used to accelerate the solution of the combined-field integral equation pertinent to the analysis of time-harmonic electromagnetic scattering from perfectly conducting surfaces. The scalability of the scheme is investigated theoretically and verified on a state-of-the-art multi-core cluster for benchmark scattering problems. Timing and speedup results on up to 1024 quad-core processors show that the hybrid MPI/OpenMP parallelization of AIM exhibits better strong scalability (fixed problem size speedup) than pure MPI parallelization of it when multiple cores are used on each processor. 相似文献

11.

Computing the velocity of a rotating flow

L.K. Lundin 《Parallel Computing》1998,24(14):2021-2034

To compute the time-dependent flow of a rotating incompressible fluid we consider the velocity–vorticity formulation of the Navier–Stokes equations in cylindrical coordinates. In the numerical method employed the velocity field at each time-step is found as the least squares solution of an overdetermined system of linear equations, Ax=b. We consider how to compute x using the preconditioned conjugate gradient algorithm for least squares (PCGLS) on a distributed parallel computer. The various aspects of using a parallel computer are discussed, and results for a wide range of parallel computers are presented. The parallel speed-up depends on the architecture but is typically about 80% of the number of processors used. 相似文献

12.

基于PVM的线性方程组的一种网上并行迭代算法 总被引：1，自引：0，他引：1

尚月强杨一都《计算机应用与软件》2006,23(11):50-51

针对基于PVM的桌面PC机联网而成的网络并行计算环境中,处理机的运算速度较快,而处理机间的通信相对较慢的实际情况,提出了求解线性方程组的一种分组Guass-Seidel并行迭代算法,该算法将线性方程组的增广矩阵按行分块储存在各处理机,每台处理机分别对各自的块采用Guass-Seidel迭代法进行迭代计算,其处理机间的通信较少,实现容易。并用1～24台桌面PC机联成的局域网,在PVM 3．4 on Windows2000,VC 6．0并行计算平台上编程对该算法进行了数值试验,试验结果表明,该算法较传统的Jacobi并行迭代算法和传统的Guass—Seidel并行迭代算法更优越。相似文献

13.

Development of parallel 3D RKPM meshless bulk forming simulation system

《Advances in Engineering Software》2007,38(2):87-101

A parallel computational implementation of modern meshless system is presented for explicit for 3D bulk forming simulation problems. The system is implemented by reproducing kernel particle method. Aspects of a coarse grain parallel paradigm—domain decompose method—are detailed for a Lagrangian formulation using model partitioning. Integration cells are uniquely assigned on each process element and particles are overlap in boundary zones. Partitioning scheme multilevel recursive spectrum bisection approach is applied. The parallel contact search algorithm is also presented. Explicit message passing interface statements are used for all communication among partitions on different processors. The parallel 3D system is developed and implemented into 3D bulk metal forming problems, and the simulation results demonstrated the efficiency of the developed parallel reproducing kernel particle method system. 相似文献

14.

Discovery of inexact concepts from structural data

Holder L.B. Cook D.J. 《Knowledge and Data Engineering, IEEE Transactions on》1993,5(6):992-994

相似文献

15.

Parallel thinning algorithms on multicomputers: experimental study on load balancing

M. G. Montoya C. Gil I. Garcia 《Concurrency and Computation》2000,12(5):327-340

In this work, a practical implementation of two parallel thinning algorithms on a multicomputer system are described. The solution has been conceived for a multiprocessor using the SPMD (single program multiple data) programming model. Our main goal is intended to describe our experiences on data partition/distribution among processors for parallel thinning algorithms as a representative type of algorithm where communications take place between neighbor processors and the work load for each processor depends on the input data. It will be shown how the efficiency of the parallel implementation can be improved through the application of a preprocess. This preprocess is based on the analysis of the work load balance. An analysis of the communication cost is also made. Although the results shown here are concerned with the implementations of two parallel thinning algorithms we think that our proposal about data distribution is general and useful for a wide set of algorithms in the field of image processing. Copyright © 2000 John Wiley & Sons, Ltd. 相似文献

16.

The CLIP7A image processor 总被引：1，自引：0，他引：1

Fountain T.J. Matthews K.N. Duff M.J.B. 《IEEE transactions on pattern analysis and machine intelligence》1988,10(3):310-319

相似文献

17.

Optimal design of structures with multiple design variables per group and multiple loading conditions on the personal computer

《Computers & Structures》1986,22(2):179-184

A finite element based programming system for minimum weight design of a truss-type structure subjected to displacement, stress, and lower and upper bounds on design variables is presented. The programming system consists of a number of independent processors, each performing a specific task. These processors, however, are interfaced through a well-organized data base, thus making the tasks of modifying, updating, or expanding the programming system much easier in a friendly environment provided by many inexpensive personal computers. The proposed software can be viewed as an important step in achieving a “dummy” finite element for optimization. The programming system has been implemented on both large and small computers (such as VAX, CYBER, IBM-PC, and APPLE) although the focus is on the latter. Examples are presented to demonstrate the capabilities of the code. The present programming system can be used stand-alone or as part of the multilevel decomposition procedure to obtain optimum design for very large scale structural systems. Furthermore, other related research areas such as developing optimization algorithms (or in the larger level: a structural synthesis program) for future trends in using parallel computers may also benefit from this study. 相似文献

18.

R10000多处理器簇中的外部冲突解决方案

易佳望《计算机时代》2012,(6):1-3,6

在基于MIPS R10000处理器构建采用簇总线的多处理器系统时,发现R10000用户手册给出的外部冲突解决方案只适用于采用专用EA的单或多处理器系统.鉴于此,介绍了R10000处理器的系统配置和系统接口的一致性,分析了R10000用户手册所给出的外部冲突解决方案的局限性,并基于该外部冲突解决方案,对采用簇总线的多处理器系统中的外部冲突进行了研究,给出了簇协调器可以采用的一个外部冲突解决方案. 相似文献

19.

The Connection Machine: PDE solution on 65 536 processors

《Parallel Computing》1988,9(1):1-24

The Connection Machine is a massively parallel architecture with 65 536 single-bit processors and 32 Mbytes of memory, organized as a high-dimensional hypercube. A sophisticated router system provides efficient communication between remote processors. A rich software environment, including a parallel extension of COMMON LISP, provides access to the processors and network. Virtual processor capability extends the degree of fine-grained parallelism beyond 1 000 000.We describe the hardware and the parallel programming environment. We then present implementations of SOR, Multigrid and Conjugate Gradient algorithms for solving Partial Differential Equations on the Connection Machine. Measurements of computational efficiency are provided as well as an analysis of opportunities for achieving better performance. Despite the lack of floating-point hardware, computation rates above 100 Mflops have been achieved in PDE solution. Virtual processors prove to be a real advantage, easing the effort of software development while improving system performance significantly. 相似文献

20.

Multiprocessing for ray tracing: a hierarchical self-balancing approach 总被引：1，自引：1，他引：0

Issac D. Scherson Elisha Caspary 《The Visual computer》1988,4(4):188-196

相似文献