首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 62 毫秒
1.
刘凯  寇正 《微型机与应用》2003,22(12):12-14
OpenMP的功能、执行方式和主要指令,用OpenMP对一个描述粒子运动的模式进行了并行优化。  相似文献   

2.
随着四核微机走向市场和八十核处理器在实验室研制成功,多核正引领软件研发发生基础性变化。开发人员需要在代码中添加线程来利用系统所提供的多个内核,从而提升PC应用软件的功能和性能。文中探讨在多核微机上进行并行计算的实现技术。介绍了共享存储系统并行编程接口OpenMP的模型、指令和库函数,以及Intel C 编译器9.1和Microsoft Visual Studio 2005等对OpenMP的支持;着重探讨了二维离散快速傅里叶变换并行算法的设计、实现与优化技术;展望了高性能并行计算软构件库的开发前景。  相似文献   

3.
研究了一种基于OpenMP技术的多核架构下并行蚁群算法,通过在TSP问题中的实验表明,该算法易于操作,而且充分利用了多核处理器并行计算的优势,提高了算法的运行效率。  相似文献   

4.
5.
作为在桌面系统上兴起的技术,OpenMP在PC平台上已经非常成熟,但是在嵌入式领域,尤其是Android的开发大多还停留在传统的单核模式。Google推出的NDK R9提供了对OpenMP函数库的支持,本文介绍了OpenMp在Android上的运用,并对存在的问题进行了修正。  相似文献   

6.
程栋  王卫红 《计算机科学》2017,44(Z6):161-163, 187
SAR图像数据量大,常规识别算法复杂、处理耗时,难以满足实时性要求。针对这一问题,提出一种基于OpenMP多核计算的SAR图像目标分类算法。在分析基于模板匹配的SAR图像目标分类算法的基础上,给出基于OpenMP多核计算技术的图像处理并行处理框架,实现SAR图像目标分类算法的并行计算。最后,采用所提方法对3类目标进行分类识别实验,SAR图像分类识别的处理速度提高了8倍,表明了该方法是有效的。  相似文献   

7.
为多核平台开发一种有效的编程方法已经成为并行软件研究的一个重要目标.在嵌入式多核平台上进行了OpenMP并行程序的有效的实施运行.针对嵌入式具有有限内存资源的特点,提出了通过扩展OpenMP自定义制导语句tiling来提高并行程序在嵌入式多核平台上的运行效率.扩展后的OpenMP并行程序支持循环分片,从而能够充分利用层...  相似文献   

8.
为提高分子动力学模拟在共享内存式服务器上的计算速度,对基于OpenMP的分子动力学并行算法(Critical方法)进行了性能分析与优化。通过在多核服务器上的测试,以及加速比和并行效率的计算分析了Critical方法的并行性能,进而提出优化的三角形方法。所提方法中每个线程所计算的粒子数固定,且粒子数目呈阶梯状上升,使得各线程能够错时到达临界区。从而使程序在临界区的闲置时间比Critical方法减半,加速比明显提高。  相似文献   

9.
为提高分子动力学模拟在多核共享内存式服务器上的运算速度,在现有的分子动力学并行算法基础上提出了Multi-Critical算法。该算法使用手动划分力矩阵的方法,使多个线程进入不同名的临界区,并使用分块叠加的方法优化了并行算法,提高了并行效率。实验结果表明,对比之前的Critical算法,该算法的加速比和并行效率均有较大幅度的提高。  相似文献   

10.
多核构架下OpenMP多线程应用运行性能的研究   总被引:4,自引:0,他引:4       下载免费PDF全文
多核平台下,OpenMP线程在核间的动态迁移在一定程度上会导致应用程序性能的下降,如果将线程绑定在固定的核上运行,使其不再迁移,这种方法将有可能提升应用程序性能,达到充分利用多核平台的计算能力的目的。本文将介绍如何使用主流的编译器绑定接口以及Linux内核API的方式实现OpenMP线程与核之间的绑定,使用STREAM Benchmark和NPB在上海超级计算中心的"魔方"超级计算机刀片上测试、比较绑定前后的应用程序的性能。结果证明,使用绑定方案将有可能提升OpenMP应用程序的性能。  相似文献   

11.
12.
Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters   总被引:2,自引:0,他引:2  
Nowadays, NVIDIA's CUDA is a general purpose scalable parallel programming model for writing highly parallel applications. It provides several key abstractions – a hierarchy of thread blocks, shared memory, and barrier synchronization. This model has proven quite successful at programming multithreaded many core GPUs and scales transparently to hundreds of cores: scientists throughout industry and academia are already using CUDA to achieve dramatic speedups on production and research codes. In this paper, we propose a parallel programming approach using hybrid CUDA OpenMP, and MPI programming, which partition loop iterations according to the number of C1060 GPU nodes in a GPU cluster which consists of one C1060 and one S1070. Loop iterations assigned to one MPI process are processed in parallel by CUDA run by the processor cores in the same computational node.  相似文献   

13.
This paper presents real-time image processing applications using multicore and multiprocessing technologies. To this end, parallel image segmentation was performed on many images covering the entire surface of the same metallic and cylindrical moving objects. Experimental results on multicore CPU with OpenMP platform showed that by increasing the chunk size, the execution time decreases approximately four times in comparison with serial computing. The same experiments were implemented on GPGPU using four techniques: (1) Single image transmission with single pixel processing; (2) Single image transmission with multiple pixel processing; (3) Multiple image transmission with single pixel processing; and (4) Multiple image transmission with multiple pixel processing. All techniques were implemented on GeForce, Tesla K20 and Tesla K40. Experimental results of GPU with CUDA platform showed that by increasing the core number speedup is increased. Tesla K40 gave the best results of 35 and 12 (for the first technique), 36 and 13 (for the second technique), 54 and 16 (for the third technique), 71 and 17 (for the fourth technique) times improvement without and with data transmission time in comparison with serial computing. As a result, users are suggested to use Tesla K40 GPU and Multiple image transmission with multiple pixel processing to get the maximum performance.  相似文献   

14.
15.
The frontal solution method has proven to be an effective means of solving the matrix equations resulting from the application of the finite element method to a variety of problems. In this study, several versions of the frontal method were compared in efficiency for several hydrodynamics problems. Three basic modifications were shown to be of value: 1. Elimination of equations with boundary conditions beforehand, 2. Modification of the pivoting procedures to allow dynamic management of the equation size, and 3. Storage of the eliminated equations in a vector. These modifications are sufficiently general to be applied to other classes of problems.  相似文献   

16.
《电子技术应用》2016,(1):31-33
提出了一种基于多核通信应用程序接口(MCAPI)标准的多核软件开发方法,该标准提供了基于消息传递的API,适用于核间通信,大大提高了应用程序在多核处理器上的可移植性。采用poly-platform软件工具进行多核软件开发,首先建立拓扑结构,然后定义节点工程,完成存储分配等工作,再利用MCAPI模板完成节点间通信,最后编制各个节点的应用程序。该软件开发流程独立于厂商、器件和操作系统,可将应用程序快速灵活地映射到不同的同构和异构多核架构上,大大提高了多核软件的开发效率。  相似文献   

17.
In the current study a meshfree Lagrangian particle method for the Landau–Lifshitz Navier–Stokes (LLNS) equations is developed. The LLNS equations incorporate thermal fluctuation into macroscopic hydrodynamics by the addition of white noise fluxes whose magnitudes are set by a fluctuation–dissipation theorem. The study focuses on capturing the correct variance and correlations computed at equilibrium flows, which are compared with available theoretical values. Moreover, a numerical test for the random walk of standing shock wave has been considered for capturing the shock location.  相似文献   

18.
Despite its ease of use, OpenMP has failed to gain widespread use on large scale systems, largely due to its failure to deliver sufficient performance. Our experience indicates that the cost of initiating OpenMP regions is simply too high for the desired OpenMP usage scenario of many applications. In this paper, we introduce CLOMP, a new benchmark to characterize this aspect of OpenMP implementations accurately. CLOMP complements the existing EPCC benchmark suite to provide simple, easy to understand measurements of OpenMP overheads in the context of application usage scenarios. Our results for several OpenMP implementations demonstrate that CLOMP identifies the amount of work required to compensate for the overheads observed with EPCC. We also show that CLOMP also captures limitations for OpenMP parallelization on SMT and NUMA systems. Finally, CLOMPI, our MPI extension of CLOMP, demonstrates which aspects of OpenMP interact poorly with MPI when MPI helper threads cannot run on the NIC.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号