首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
In this paper, we describe a massively parallel implementation of the Splitting Equilibration Algorithm using CM FORTRAN on the Thinking Machines CM-2 system. Numerical results using upwards of 32 768 (32 K) processors on the CM-2 system, the Connection Machine, are presented for both input/output and social accounting matrix estimation problems and compared with those obtained for the same problems on the IBM 3090. Our experiences with the relative ease/difficulty of the implementations on these fine-grain and coarse-grain parallel architectures are also presented and discussed.  相似文献   

2.
本文围绕当前应用领域对并行计算的需求趋势,阐述了高性能计算中的并行构件开发技术的应用前景。提出了分步实现领域专用的SPMD模型下并行库的思想。在此基础上,以神威并行计算机为依托,初步实现了一个颗粒流体系统拟颗粒模拟并行库的设计。实验表明,该方法能有效提高并行程序的开发效率。  相似文献   

3.
The HIRLAM (high resolution limited area modelling) limited-area atmospheric model was originally developed and optimized for shared memory vector-based computers, and has been used for operational weather forecasting on such machines for several years. This paper describes the algorithms applied to obtain a highly parallel implementation of the model, suitable for distributed memory machines. The performance results presented indicate that the parallelization effort has been successful, and the Norwegian Meteorological Institute will run the parallel version in production on a Cray T3E.  相似文献   

4.
中国科学院过程工程研究所多相反应实验室,建立了一个通用粒子模拟平台并已开始应用。目前类似的并行模拟系统采用的Shift并行通信模式往往有一些问题,需要一种新的通信模式来弥补它的不足。本文设计具有良好通用性的非结构化通信模式All2All,用来完成通用粒子方法模拟平台中计算节点问的通信。本文的算例证明这种通信模式可解决在粒子并行模拟Shift通信模式所不能处理的,具有复杂拓扑关系的相邻节点间的数据通信问题。本文设计的All2All通信模式方法只需稍加修改,就可以方便地应用于其它领域的并行计算系统。  相似文献   

5.
The computational complexity of a parallel algorithm depends critically on the model of computation. We describe a simple and elegant rule-based model of computation in which processors apply rules asynchronously to pairs of objects from a global object space. Application of a rule to a pair of objects results in the creation of a new object if the objects satisfy the guard of the rule. The model can be efficiently implemented as a novel MIMD array processor architecture, the Intersecting Broadcast Machine. For this model of computation, we describe an efficient parallel sorting algorithm based on mergesort. The computational complexity of the sorting algorithm isO(nlog2 n), comparable to that for specialized sorting networks and an improvement on theO(n 1.5) complexity of conventional mesh-connected array processors.  相似文献   

6.
The implementation of the GESIMA mesoscale atmospheric model on message passing, distributed memory parallel computers is presented. Particular emphasis is given to the parallelization of the conjugate gradient solver using pre-conditioning by an incomplete LU factorization. Performance results are presented for the Cray T3D and Cray T3E systems, which show good scalability over a range of problem sizes and numbers of processors.  相似文献   

7.
对称正定矩阵的并行LDLT分解算法实现   总被引:1,自引:0,他引:1  
基于网络机群这一新的并行环境和消息传递界面MPI给出了两种不带平方根的Cholesky并行分解算法,算法采用行卷帘存储方案和提前发送策略,从而减少了负载的不平衡,增加了计算通信的重叠,减少了通信时间。理论分析和数值试验均表明,算法具有较高的并行加速比和效率。  相似文献   

8.
A mathematical model for the simulation of three-dimensional bursting phenomena and its parallel implementation are presented. The model consists of four nonlinearly coupled partial differential equations that include fast and slow variables, and exhibits bursting in the absence of diffusion. The differential equations have been discretized by means of a second-order accurate in both space and time, linearly-implicit finite difference method in equally-spaced grids. The resulting system of linear algebraic equations at each time level has been solved by means of the Preconditioned Conjugate Gradient (PCG) method. Three different parallel implementations of the proposed mathematical model have been developed; two of these implementations, i.e., the MPI and the PETSc codes, are based on a message passing paradigm, while the third one, i.e., the OpenMP code, is based on a shared space address paradigm. These three implementations are evaluated on two current high performance parallel architectures, i.e., a dual-processor cluster and a Shared Distributed Memory (SDM) system. A novel representation of the results that emphasizes the most relevant factors that affect the performance of the paralled implementations, is proposed. The comparative analysis of the computational results shows that the MPI and the OpenMP implementations are about twice more efficient than the PETSc code on the SDM system. It is also shown that, for the conditions reported here, the nonlinear dynamics of the three-dimensional bursting phenomena exhibits three stages characterized by asynchronous, synchronous and then asynchronous oscillations, before a quiescent state is reached. It is also shown that the fast system reaches steady state in much less time than the slow variables.  相似文献   

9.
A compressible model able to manage incompressible two-phase flows as well as compressible motions is proposed. After a presentation of the multiphase compressible concept, the new model and related numerical methods are detailed on fixed structured grids. The presented model is a 1-fluid model with a reformulated mass conservation equation which takes into account the effects of compressibility. The coupling between pressure and flow velocity is ensured by introducing mass conservation terms in the momentum and energy equations. The numerical model is then validated with four test cases involving the compression of an air bubble by water, the liquid injection in a closed cavity filled with air, a bubble subjected to an ultrasound field and finally the oscillations of a deformed air bubble in melted steel. The numerical results are compared with analytical results and convergence orders in space are provided.  相似文献   

10.
王占杰  李锐  肖侯亮 《计算机工程与设计》2006,27(7):1251-1253,1264
为了合理有效地利用企业的计算资源,快速高效地进行企业业务处理,建立了一个企业级并行处理模型.提出了业务处理与业务请求相互分离,介绍了.NET框架下的远程处理机制,解决了服务的注册、动态加载、请求和实现等相关问题.实例表明该模型能有效地利用企业计算资源来完成并行业务处理.  相似文献   

11.
An increasing awareness of the need for high speed parallel processing systems for image analysis has stimulated a great deal of interest in the design and development of such systems. Efficient processing schemes for several specific problems have been developed providing some insight into the general problems encountered in designing efficient image processing algorithms for parallel architectures. However it is still not clear what architecture or architectures are best suited for image processing in general, or how one may go about determining those which are. An approach that would allow application requirements to specify architectural features would be useful in this context. Working towards this goal, general principles are outlined for formulating parallel image processing tasks by exploiting parallelism in the algorithms and data structures employed. A synchronous parallel processing model is proposed which governs the communication and interaction between these tasks. This model presents a uniform framework for comparing and contrasting different formulation strategies. In addition, techniques are developed for analyzing instances of this model to determine a high level specification of a parallel architecture that best ‘matches’ the requirements of the corresponding application. It is also possible to derive initial estimates of the component capabilities that are required to achieve predefined performance levels. Such analysis tools are useful both in the design stage, in the selection of a specific parallel architecture, or in efficiently utilizing an existing one. In addition, the architecture independent specification of application requirements makes it a useful tool for benchmarking applications.  相似文献   

12.
MDA的设想与实现   总被引:7,自引:1,他引:7  
模型驱动体系结构(MDA)提出了一种利用模型进行软件开发的方法,它将模型作为软件开发过程的关键,MDA的基本思想是将模型区分为平台独立模型(platform independent model)和平台相关模型(platform specific model),通过不同抽象层次模型之间的转换完成软件开发过程。介绍了MDA的设想,包括MDA的基本概念,基于MDA的开发过程,讨论了实现MDA所涉及的几个重要问题。  相似文献   

13.
Direct numerical simulation of turbulent channel flows between isothermal walls have been carried out using discontinuous Galerkin method. Three Mach numbers are considered (0.2, 0.7, and 1.5) at a fixed Reynolds number ≈2800, based on the bulk velocity, bulk density, half channel width, and dynamic viscosity at the wall. Power law and log-law with the scaling of the mean streamwise velocity are considered to study their performance on compressible flows and their dependence on Mach numbers. It indicates that power law seems slightly better and less dependent on Mach number than the log-law in the overlap region. Mach number effects on the second-order (velocity, pressure, density, temperature, shear stress, and vorticity fluctuations) and higher-order (skewness and flatness of velocity, pressure, density, and temperature fluctuations) statistics are explored and discussed. Both inner (that is wall variables) and outer (that is global) scalings (with Mach number) are considered. It is found that for some second-order statistics (i.e. velocity, density, and temperature), the outer scaling collapses better than the inner scaling. It is also found that near-wall large-scale motions are affected by Mach number. The near-wall spanwise streak spacing increases with increasing Mach number. Iso-surfaces of the second invariant of the velocity gradient tensor are more sparsely distributed and elongated as Mach number increases, which is similar to the distribution of near-wall low speed streaks.  相似文献   

14.
SCAN is a special purpose context-free language which describes and generates a wide range of array accessing algorithms from a short set of simple ones. These algorithms may represent scan techniques for image processing, but at the same time they stand as generic data accessing strategies. In this paper we present two schemes (one sequential and one parallel) which implement the SCAN language and compare their memory requirements and execution time.  相似文献   

15.
The Wang-Landau algorithm is a flat-histogram Monte Carlo method that performs random walks in the configuration space of a system to obtain a close estimation of the density of states iteratively. It has been applied successfully to many research fields. In this paper, we propose a parallel implementation of the Wang-Landau algorithm on computers of shared memory architectures by utilizing the OpenMP API for distributed computing. This implementation is applied to Ising model systems with promising speedups. We also examine the effects on the running speed when using different strategies in accessing the shared memory space during the updating procedure. The allowance of data race is recommended in consideration of the simulation efficiency. Such treatment does not affect the accuracy of the final density of states obtained.  相似文献   

16.
Although ERP systems were already introduced many years back and were implemented in different organizations, there are still companies who hesitate to decide about establishing ERP systems in their structure. This hesitation will itself result in the projects to go in vain. On the other hand, taking into account the Iranian organizations, the unfamiliarity with these systems is obviously comprehended, something that stems from the lack of information in decision-makers and managers concerning the above-mentioned issue, together with the feeling of fear and inconvenience with this novel technology.  相似文献   

17.
针对多尺度预报模式离散得到的非对称稀疏线性方程组的求解,通过利用GCR(k)算法的固有性质,消除GCR(k)算法的内积计算数据相关性,给出了一种改进的GCR(R)(IGCR(k))算法.同GCR(k)算法对比,IGCR(k)算法与GCR(k)算法有相同的收敛性,在基于MPI的分布式存储并行机群上进行并行计算时,同步开销次数减少为GCR(k)算法的一半.数值计算结果与理论分析表明改进的GCR(k)算法的性能要优于GCR(k)算法.  相似文献   

18.
19.
Formalized study of self-assembly has led to the definition of the tile assembly model, Previously I presented ways to compute arithmetic functions, such as addition and multiplication, in the tile assembly model: a highly distributed parallel model of computation that may be implemented using molecules or a large computer network such as the Internet. Here, I present tile assembly model systems that factor numbers nondeterministically using Θ(1)Θ(1) distinct components. The computation takes advantage of nondeterminism, but theoretically, each of the nondeterministic paths is executed in parallel, yielding the solution in time linear in the size of the input, with high probability. I describe mechanisms for finding the successful solutions among the many parallel executions and explore bounds on the probability of such a nondeterministic system succeeding and prove that the probability can be made arbitrarily close to 1.  相似文献   

20.
We propose a model of parallel computation, the YPRAM, that allows general parallel algorithms to be designed for a wide class of parallel models. The basic model captures locality among processors, which is measured as a function of two parameters; latency and bandwidth.

We design YPRAM algorithms for solving several fundamental problems: parallel prefix, sorting, sorting numbers from a bounded range, and list ranking. We show that our model predicts, reasonably accurately, the actual known performances of several basic parallel models — PRAM, hypercube, mesh and tree — when solving these problems.  相似文献   


设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号