首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 234 毫秒
1.
本文采用MPI消息传递模式自主开发出适用于高超声速流动数值模拟的并行计算软件,该软件以三维Navier-Stokes方程为基本控制方程来求解层流问题,应用基于结构网格的有限体积法对计算域进行离散,采用AUSMPW+格式求解对流通量,利用MUSCL插值方法获得高阶精度,时间格式上采用LU-SGS方法进行时间迭代以加快求解定常流动的收敛过程。在高性能计算机上针对不同高超声速流动进行大规模并行计算的结果表明,所开发的CFD并行计算软件具有较高的并行计算效率,为高超声速飞行器气动力/热的准确预测提供了高效工具。  相似文献   

2.
针对民机增升构型失速特性的数值模拟,我们基于贪婪负载平衡算法的剖分工具对多块结构网格进行区域分割,在某新型超级计算机系统上完成求解软件的移植、优化和测试,采用 2 亿量级的计算网格开展大规模并行计算研究,测试完成了万核级负载平衡的网格区域分割,实现了增升构型失速特性的 4 096 核数并行计算,并行效率达到 50% 以上,提高了工程应用中对复杂流动现象的数值模拟能力。数值模拟结果加深了对增升构型失速流动机理的理解,可以为增升装置设计优化提供有意义的参考依据。  相似文献   

3.
基于Fluent的全机数值模拟及并行计算   总被引:3,自引:0,他引:3  
利用CFD商用软件Fluent对亚声速飞行飞机的三维绕流流场进行了数值模拟以及并行计算,得到了飞机附近的流场,实现了此软件在高性能并行计算机上的并行;并且通过对不同数量网格在不同结点数目机群上的计算结果进行分析比较,验证了此商用软件在并行平台上应用的有效性,也为进行大规模科学工程计算提供了技术参照。  相似文献   

4.
激光推进数值模拟预处理程序为数值模拟主程序提供网格信息和设置边界条件,实现并行计算可以缩短大规模网格生成时间,从而提升高分辨率数值模拟的效率。分析并利用公用数据的特点改进原有串行算法,进而实现并行计算。算法测试结果表明,该并行算法有效地缩短了网格生成时间。  相似文献   

5.
格点量子色动力学(格点QCD)是研究夸克、胶子等微观粒子间相互作用的重要理论和方法. 通过将时空离散化为四维结构网格, 并将量子色动力学的基本场量定义在网格上, 让研究人员可以使用数值模拟方法, 从第一性原理出发研究强子间相互作用和性质, 但这个过程中的计算量极大, 需要进行大规模并行计算. 格点QCD计算的核心基础为格点QCD求解器, 是程序运行主要的计算热点模块. 本文研究在国产异构计算平台下格点QCD求解器的实现与优化, 提出一套格点QCD求解器的设计实现, 实现了BiCGSTAB求解器, 显著降低了迭代次数; 通过对奇偶预处理技术, 降低了所求问题的计算规模; 针对国产异构加速卡的特点, 优化了Dslash模块的访存操作. 实验测试表明, 相比优化前的求解器获得了约30倍的加速比, 为国产异构超算下格点QCD软件性能优化提供了有益的参考价值.  相似文献   

6.
本文介绍了高性能并行计算在CFD数值模拟中的应用。CFD高性能并行计算可扩大求解规模,加快求解速度,是CFD实现高效计算的必然发展趋势。本文通过"数值风洞"的概念分析了CFD高性能计算的应用前景及对高性能计算的需求。通过某乘波飞行器前体并行算例对8~256CPU的CFD大规模并行效率和加速比进行了分析,并将CFD并行计算应用于高温热化学非平衡的返回舱数值计算中。  相似文献   

7.
WCNS高精度并行软件的大规模计算研究   总被引:1,自引:0,他引:1  
本文通过求解任意坐标系下的定常雷诺平均N-S方程和SST两方程湍流模型,采用五阶精度的加权紧致非线性格式(WCNS-E-5),实现流场的高精度数值模拟;基于分布式存储系统,采用MPI并行编程环境、非堵塞通信机制和遗传算法负载平衡,实现高精度模拟软件的并行化。在国防科学技术大学高性能计算应用研究中心的"天河"系统上完成软件移植、测试,通过对DLR-F6翼身组合体的模拟,说明软件并行策略和开发的正确性。最后,实现某民机全机的高精度并行模拟,网格规模达到1亿,为下一步WCNS高精度并行软件的大规模工程实际应用打下了坚实基础。  相似文献   

8.
基于Cart3D的全机数值模拟及并行计算   总被引:2,自引:0,他引:2       下载免费PDF全文
利用CFD商用软件Cart3D对亚声速飞行飞机的三维绕流流场进行了数值模拟以及并行计算,得到了飞机附近的流场,实现了此软件在高性能并行计算机上的并行;通过对比不同商用软件的计算结果,验证了用Cart3D软件进行数值模拟的有效性,为大规模科学工程计算提供了技术参照。  相似文献   

9.
本文主要介绍了大规模油藏数值模拟并行计算技术在国内的研究进展,提供了精细油藏模拟在国产Beowulf系统上的计算实例和应用效果,给出了百万网格点规模的油藏应用算例在不同处理器规模下的数值模拟计算结果与性能分析,并实现了一个针对海量数据可视化的三维图、二维图、表格显示的后处理显示系统.  相似文献   

10.
基于Fluent的绕流问题的数值模拟与并行计算   总被引:6,自引:0,他引:6  
辛晓华  张武  周华 《计算机工程与设计》2005,26(8):2153-2154,2200
使用商用软件Fluent对二维不可压流问题进行了两次实验,两次实验的网格数相差很多,发现当网格数少的时候,并行计算反而不如单机的效果好;而当网格数足够大的时候,并行计算就显示出其优越性来,目的在于考察Fluent软件的并行计算能力,为进行大规模工程应用计算提供技术参照。  相似文献   

11.
Simulated annealing is known to be an efficient method for combinatorial optimization problems. Its usage for realistic problem size, however, has been limited by the long execution time due to its sequential nature. This report presents a practical approach to synchronous simulated annealing for massively parallel distributed-memory multiprocessors. We use an n-ary speculative tree to execute n different iterations in parallel on n processors, called generalized speculative computation (GSC). Execution results of the 100- to 500-city traveling salesman problems on the AP1000 massively parallel multiprocessor demonstrate that the GSC approach can be an effective method for parallel simulated annealing as it gave over 20-fold speedup on 100 processors  相似文献   

12.
Search of discrete spaces is important in combinatorial optimization. Such problems arise in artificial intelligence, computer vision, operations research, and other areas. For realistic problems, the search spaces to be processed are usually huge, necessitating long computation times, pruning heuristics, or massively parallel processing. We present an algorithm that reduces the computation time for graph matching by employing both branch-and-bound pruning of the search tree and massively-parallel search of the as-yet-unpruned portions of the space. Most research on parallel search has assumed that a multiple-instruction-stream/multiple-data-stream (MIMD) parallel computer is available. Since massively parallel stream (SIMD) computers are much less expensive than MIMD systems with equal numbers of processors, the question arises as to whether SIMD systems can efficiently handle state-space search problems. We demonstrate that the answer is yes, and in particular, that graph matching has a natural and efficient implementation on SIMD machines  相似文献   

13.
Modern complex embedded applications in multiple application fields impose stringent and continuously increasing functional and parametric demands. To adequately serve these applications, massively parallel multi-processor systems on a single chip (MPSoCs) are required. This paper is devoted to the design of scalable communication architectures of massively parallel hardware multi-processors for highly-demanding applications. We demonstrated that in the massively parallel hardware multi-processors the communication network influence on both the throughput and circuit area dominates the processors influence, while the traditionally used flat communication architectures do not scale well with the increase of parallelism. Therefore, we propose to design highly optimized application-specific partitioned hierarchical organizations of the communication architectures through exploiting the regularity and hierarchy of the actual information flows of a given application. We developed related communication architecture synthesis strategies and incorporated them into our quality-driven model-based multi-processor design methodology and related automated architecture exploration framework. Using this framework we performed a large series of architecture synthesis experiments. Some of the results of the experiments are presented in this paper. They demonstrate many features of the synthesized communication architectures and show that our method and related framework are able to efficiently synthesize well scalable communication architectures even for the high-end massively parallel multi-processors that have to satisfy extremely stringent computation demands.  相似文献   

14.
Parallel finite-element computation of 3D flows   总被引:5,自引:0,他引:5  
The authors describe their work on the massively parallel finite-element computation of compressible and incompressible flows with the CM-200 and CM-5 Connection Machines. Their computations are based on implicit methods, and their parallel implementations are based on the assumption that the mesh is unstructured. Computations for flow problems involving moving boundaries and interfaces are achieved by using the deformable-spatial-domain/stabilized-space-time method. Using special mesh update schemes, the frequency of remeshing is minimized to reduce the projection errors involved and also to make parallelizing the computations easier. This method and its implementation on massively parallel supercomputers provide a capability for solving a large class of practical problems involving free surfaces, two-liquid interfaces, and fluid-structure interactions  相似文献   

15.
§1.引 言 许多大型科学与工程计算问题都归结为大型稀疏线性方程组的求解,因此,在高性能并行计算机高速发展的今天,面向并行计算环境研究大型稀疏线性方程组的高效并行算法显得尤为重要. 对于大型稀疏线性方程组 Ax=b, (1)  相似文献   

16.
17.
文中给出一种p-adic数制式非对称连接神经网络模型,该网络在整个矢量空间只有唯一平衡点,因而可获得问题的最优解,且在存在计算误差,这种神经网络保持高度并行结构,可用了代数符号计算,本文重点分析了实现神经网络的方法,给代数符号计算提供了一个新的计算模型。  相似文献   

18.
FFT(快速傅里叶变换)是基于提高DFT(离散傅里叶变换)计算的高效算法,它在众多科学和工程领域都得到了广泛的应用。自FFT算法出现以后,从早期的以降低复杂度到近年以来的大规模并行FFT计算,各种优化算法得到广泛的研究。在并行运算领域中,随着可编程的、并行化GPU的不断推广,特别是通用并行统一计算架构CUDA的出现,极大增强了GPU的计算能力,在编程和优化等方面都有显著地提升。鉴于此,本文在分析FFT算法实现的基础上,研究了一种适合GPU运算的FFT并行计算方法,并通过CUDA架构实现了FFT算法在GPU上的运算。该方法的引入在理论不计算数据传输的情况下,使一维FFT运算时间的复杂度由O(N logN2)可以降到O(N/rlogN2)。通过验证,本文提出的CUDA的并行FFT方法得到较好的加速效果,在精度计算上也符合实际的要求,从而证明了该方法的正确性和有效性。  相似文献   

19.
ContextEmerging multicores and clusters of multicores that may operate in parallel have set a new challenge – development of massively parallel software composed of thousands of loosely coupled or even completely independent threads/processes, such as MapReduce and Java 3.0 workers, or Erlang processes, respectively. Testing and verification is a critical phase in the development of such software products.ObjectiveGenerating test cases based on operational profiles and certifying declared operational reliability figure of the given software product is a well-established process for the sequential type of software. This paper proposes an adaptation of that process for a class of massively parallel software – large-scale task trees.MethodThe proposed method uses statistical usage testing and operational reliability estimation based on operational profiles and novel test suite quality indicators, namely the percentage of different task trees and the percentage of different paths.ResultsAs an example, the proposed method is applied to operational reliability certification of a parallel software infrastructure named the TaskTreeExecutor. The paper proposes an algorithm for generating random task trees to enable that application. Test runs in the experiments involved hundreds and thousands of Win32/Linux threads thus demonstrating scalability of the proposed approach. For practitioners, the most useful result presented is the method for determining the number of task trees and the number of paths, which are needed to certify the given operational reliability of a software product. The practitioners may also use the proposed coverage metrics to measure the quality of automatically generated test suite.ConclusionThis paper provides a useful solution for the test case generation that enables the operational reliability certification process for a class of massively parallel software called the large-scale task trees. The usefulness of this solution was demonstrated by a case study – operational reliability certification of the real parallel software product.  相似文献   

20.
A massively parallel architecture called TOSCA (tokens sending cellular automation) is presented that performs edge pixel detection and skeletonization in the image processing area. Each cell of this cellular automaton has a very reduced set of instructions and a very small amount of memory. The computation is based on token propagation, counting devices, and local processing. The skeletonization method is based on the Chamfer distance  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号