首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 156 毫秒
1.
多核微机基于OpenMP的并行计算   总被引:5,自引:2,他引:5  
随着四核微机走向市场和八十核处理器在实验室研制成功,多核正引领软件研发发生基础性变化。开发人员需要在代码中添加线程来利用系统所提供的多个内核,从而提升PC应用软件的功能和性能。文中探讨在多核微机上进行并行计算的实现技术。介绍了共享存储系统并行编程接口OpenMP的模型、指令和库函数,以及Intel C++编译器9.1和Microsoft Visual Studio 2005等对OpenMP的支持;着重探讨了二维离散快速傅里叶变换并行算法的设计、实现与优化技术;展望了高性能并行计算软构件库的开发前景。  相似文献   

2.
基于多核微机的微粒群并行算法   总被引:4,自引:1,他引:3       下载免费PDF全文
提出了一种基于Logistic模型的惯性权重非线性调整策略,采用OpenMP多线程编程,在微机上实现了微粒群算法的多核并行计算。通过对BenchMark测试函数集中的5个函数进行测试,试验结果表明,采用基于Logistic模型的惯性权重非线性调整策略在算法成功率和收敛代数都优于线性调整策略,而基于OpenMP的微粒群多核并行计算使得计算速度得到提高。  相似文献   

3.
随着四核微机走向市场和八十核处理器在实验室研制成功,多核正引领软件研发发生基础性变化。开发人员需要在代码中添加线程来利用系统所提供的多个内核,从而提升PC应用软件的功能和性能。本文探讨在多核微机上进行并行计算的实现技术,介绍基于基本线程类的多线程类的设计,包括属性、方法和事件的设计,着重探讨多个线程的同步核互斥问题。在基于多线程类的基础上,简要探讨VCL控件和ActiveX控件的实现方法。最后,展望了高性能并行计算软构件库的开发前景。  相似文献   

4.
为了达到提高颗粒流体动力学方法 GHM计算效率的目标,分析了GHM模型的主要计算模块,抽取其中的可并行计算模块,基于多核计算机的硬件环境,应用OpenMP多线程并行计算模型,对采用数值积分方法求解颗粒运动方程的部分,实现求解过程的并行计算。最后通过多次实验验证程序的正确性及算法性能。实验结果表明,在Windows 7系统4核8线程处理器的计算机上,并行程序的并行加速比最高达到了2.5,说明OpenMP多核并行技术能较显著地提高GHM方法的计算性能。  相似文献   

5.
多核并行技术在分子动力学模拟中的应用   总被引:1,自引:0,他引:1  
为了充分利用多核处理器资源,研究了一种用于分子动力学模拟中的多核并行技术。在多核处理器上利用OpenMP技术实现多线程创建与同步、动态设置子线程的调度运行方式以及负载均衡以减少子线程执行等待时间。通过对不同分子体系结构下的动力学模型测试,得出在不同子线程下并行计算的时间,并且得到了良好的性能加速比。实验结果表明,采用OpenMP并行技术可有效地提高电荷求解过程在分子动力学模拟运算中的时间效率,以及多核计算机资源的利用率。  相似文献   

6.
OpenMP是现代多核机群系统采用的主要并行编程模型之一,在单CPU多核上可以获得良好的加速性能,但在整个机群系统上使用时,需要解决可扩展性差的问题.首先设计了求解非平衡动力学方程的并行算法.基于分布共享的多核机群系统,采用显式数据分布OpenMP并行计算方法,将数据进行分布式划分,分配到每个OpenMP线程,通过数据共享实现数据交换.计算结果表明显式OpenMP并行程序在保持可读性的同时,具有良好的可扩展性,在4核Xeon处理器构成的分布共享机群系统上,非平衡动力学方程组的数值并行计算可以扩展到1024个CPU核,具有明显的并行加速计算效果.  相似文献   

7.
粒子滤波中大量的粒子计算使得算法的实时性较差。由于粒子滤波本身具有可并行化的特点,因此利用OpenMP多线程库派生出多个线程,将算法过程由单线程串行执行转变为多线程并行执行。用多核并行计算技术实现粒子滤波运动目标的跟踪。实验结果表明:基于多核的并行计算技术提高了粒子滤波算法的计算效率。  相似文献   

8.
多核环境下AREM模式混合并行计算研究   总被引:1,自引:1,他引:0       下载免费PDF全文
使用多核处理器已成为构建高性能计算机系统的主流方式。结合多核高性能计算机系统集共享内存结构和分布式内存结构于一体的体系结构特点,对AREM模式开展MPI/OpenMP混合并行计算研究与实现。性能测试结果表明,使用MPI/OpenMP混合并行计算可以将并行应用扩展至更大处理机规模,缩短计算时间,不对原程序结构做大的改动、以增量方式和较小的并行化代价,取得比较好的并行计算效果。  相似文献   

9.
程栋  王卫红 《计算机科学》2017,44(Z6):161-163, 187
SAR图像数据量大,常规识别算法复杂、处理耗时,难以满足实时性要求。针对这一问题,提出一种基于OpenMP多核计算的SAR图像目标分类算法。在分析基于模板匹配的SAR图像目标分类算法的基础上,给出基于OpenMP多核计算技术的图像处理并行处理框架,实现SAR图像目标分类算法的并行计算。最后,采用所提方法对3类目标进行分类识别实验,SAR图像分类识别的处理速度提高了8倍,表明了该方法是有效的。  相似文献   

10.
王文义  冉晓龙 《计算机科学》2015,42(8):28-31, 59
着重分析了多核架构系统中内存对齐技术与cache利用率等因素对并行程序性能的影响。用共享存储环境OpenMP分析了并行计算量与处理器核心数目之间的关系,通过用MPI编程实现的矩阵相乘的行划分和CANNON算法等实例分析,指出了只有综合考虑了多核系统的结构特征、系统软件、多核编程语言环境以及正确运用算法等,才能设计出高效且能耗又小的并行应用程序。  相似文献   

11.
In light of GPUs’ powerful floating-point operation capacity,heterogeneous parallel systems incorporating general purpose CPUs and GPUs have become a highlight in the research field of high performance computing(HPC).However,due to the complexity of programming on GPUs,porting a large number of existing scientific computing applications to the heterogeneous parallel systems remains a big challenge.The OpenMP programming interface is widely adopted on multi-core CPUs in the field of scientific computing.To effectively inherit existing OpenMP applications and reduce the transplant cost,we extend OpenMP with a group of compiler directives,which explicitly divide tasks among the CPU and the GPU,and map time-consuming computing fragments to run on the GPU,thus dramatically simplifying the transplantation.We have designed and implemented MPtoStream,a compiler of the extended OpenMP for AMD’s stream processing GPUs.Our experimental results show that programming with the extended directives deviates from programming with OpenMP by less than 11% modification and achieves significant speedup ranging from 3.1 to 17.3 on a heterogeneous system,incorporating an Intel Xeon E5405 CPU and an AMD FireStream 9250 GPU,over the execution on the Xeon CPU alone.  相似文献   

12.
陈欢  谢健 《计算机科学》2012,39(106):392-395
随着多核处理器的普及,并为了充分利用多核PC机的特性,计算机技术逐渐向多核架构及多核计算技术发展。为提高对湖南地区100mX 100m小网格气温插值的速度,采用以OpenMP为标准的基于共享存储的并行编程模型对Kriging插值算法进行改进。在不同核的多核PC机中,采用100mX 100m小网格和500mX 500m小网格地形数据对平均气温进行插值,不仅有效减少了插值时间和提高了算法的加速比,而且集成到业务系统中大大提升了系统的反应时间及性能。  相似文献   

13.
The Cell processor is a heterogeneous multi-core processor with one power processing engine (PPE) core and eight synergistic processing engine (SPE) cores. There is a significant amount of ongoing research in programming models and tools that attempts to make it easy to exploit the computation power of the Cell architecture. In our work, we explore supporting OpenMP on the Cell processor. It is attractive to support OpenMP because programmers can continue using their familiar programming model, and existing code can be re-used. We base our work on IBM’s XL compiler, and developed new components in the XL compiler and a new runtime library. Three major issues are addressed: (1) synchronization support on heterogeneous cores; (2) code generation targeting the different instruction sets; (3) data transfers and implement the OpenMP memory model. We present experimental results for some SPEC OMP 2001 and NAS benchmarks to demonstrate the effectiveness of this approach. A visualization tool based on Paraver is also used to provide some insights into actual thread and synchronization behaviors.  相似文献   

14.
Abstract Multi-core digital signal processors (DSPs) are widely used in wireless telecommunication, core network transcoding, industrial control, and audio/video processing technologies, among others. In comparison with general-purpose multi-processors, multi-core DSPs normally have a more complex memory hierarchy, such as on-chip core-local memory and non-cache-coherent shared memory. As a result, efficient multi-core DSP applications are very difficult to write. The current approach used to program multi-core DSPs is based on proprietary vendor software development kits (SDKs), which only provide low-level, non-portable primitives. While it is acceptable to write coarse-grained task-level parallel code with these SDKs, writing fine-grained data parallel code with SDKs is a very tedious and error-prone approach. We believe that it is desirable to possess a high-level and portable parallel programming model for multi-core DSPs. In this paper, we propose OpenMDSP, an extension of OpenMP designed for multi-core DSPs. The goal of OpenMDSP is to fill the gap between the OpenMP memory model and the memory hierarchy of multi-core DSPs. We propose three classes of directives in OpenMDSP, including 1) data placement directives that allow programmers to control the placement of global variables conveniently, 2) distributed array directives that divide a whole array into sections and promote the sections into core-local memory to improve performance, and 3) stream access directives that promote big arrays into core-local memory section by section during parallel loop processing while hiding the latency of data movement by the direct memory access (DMA) of a DSP. We implement the compiler and runtime system for OpenMDSP on PreeScale MSC8156. The benchmarking results show that seven of nine benchmarks achieve a speedup of more than a factor of 5 when using six threads.  相似文献   

15.
事务存储并行程序编程接口研究   总被引:1,自引:0,他引:1       下载免费PDF全文
事务存储并行程序编程接口按照实现方式和实现层次的不同,分为三种形式:库函数接口、语言扩展和编译器指导命令。本文以RSTM、英特尔C/C++软件事务存储编译器原型和OpenTM为例,讨论了三种事务存储编程接口的特点,对OpenTM编程接口进行了扩展和完善,并对未来编程接口的发展进行了展望。  相似文献   

16.
机群OpenMP系统的设计与实现   总被引:5,自引:0,他引:5  
OpenMP以其易用性和支持增量并行的特点成为共享存储体系结构的编程标准.目前机群系统已成为高性能计算的主流平台,研究机群OpenMP系统对推进并行应用的开发和普及非常有意义.该文作者以软件DSM系统JIAJIA作为OpenMP的运行时系统,结合一个前端编译器OMP2JIA,在机群系统上实现了OpenMP/JIAJIA计算环境,同时在提高性能方面根据机群系统特点扩展了OpenMP制导,优化了后端运行时库。通过11个OpenMP应用,作者比较了该计算环境和一个支持OpenMP的硬件cc-NUMA系统(SGI 2100)的性能.结果表明,作者的机群OpenMP系统的7机平均加速比为4.62;SGI 2100系统为4.55,二者性能相当.  相似文献   

17.
This paper describes compiler techniques that can translate standard OpenMP applications into code for distributed computer systems. OpenMP has emerged as an important model and language extension for shared-memory parallel programming. However, despite OpenMP's success on these platforms, it is not currently being used on distributed system. The long-term goal of our project is to quantify the degree to which such a use is possible and develop supporting compiler techniques. Our present compiler techniques translate OpenMP programs into a form suitable for execution on a Software DSM system. We have implemented a compiler that performs this basic translation, and we have studied a number of hand optimizations that improve the baseline performance. Our approach complements related efforts that have proposed language extensions for efficient execution of OpenMP programs on distributed systems. Our results show that, while kernel benchmarks can show high efficiency of OpenMP programs on distributed systems, full applications need careful consideration of shared data access patterns. A naive translation (similar to OpenMP compilers for SMPs) leads to acceptable performance in very few applications only. However, additional optimizations, including access privatization, selective touch, and dynamic scheduling, resulting in 31% average improvement on our benchmarks.  相似文献   

18.
多层次并行体绘制算法的研究与应用   总被引:1,自引:0,他引:1  
三维数据场的体绘制技术是科学可视化中一个重要的研究方向,本文在研究和总结体绘制的发展历程与关键技术的基础之上,着重研究了体绘制中的光线投射算法,结合多核处理器机群系统,提出并实现了一种基于多层次并行编程模型的并行光线投射体绘制算法,并成功地将该算法应用于三维城市浅层地质模型,取得了良好的可视化效果。分别对MPI环境和多层次并行编程MPI+OpenMP环境下的光线投射算法进行了不同计算规模的性能比较实验。实验和分析表明,多层次并行光线投射体绘制算法加快了体绘制的速度,MPI+OpenMP多层次并行模型性能高于纯MPI编程模型的性能。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号