期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

基于SMP集群的MPI+OpenMP混合编程模型及有效实现 总被引：12，自引：1，他引：11

赵永华迟学斌《微电子学与计算机》2005,22(10):7-11

SMP集群混合了两个内存模型：每个节点是一个共享存储的多处理器，而节点间使用分布存储。这一多级体系结构引起了编程模型和性能方面的问题。文章讨论了MPI＋OpenMP混合编程模型的性能和不同的实现方法，提出了多粒度MPI＋OpenMP混合编程方法。建立了对称三对角特征问题的多粒度混合并行算法．并在深腾6800超级计算机上同纯MPI算法作了性能方面的比较。结果表明，该混合并行算法具有更好的扩展性和加速比。相似文献

2.

基于SMP机群的层次化并行编程技术的研究 总被引：2，自引：0，他引：2

下载免费PDF全文

祝永志张丹丹曹宝香禹继国《电子学报》2012,40(11):2206-2210

针对多核SMP机群的体系结构特点,讨论了MPI+OpenMP混合并行程序设计技术.提出了一种多层次化混合设计新方法.设计了N-body问题的多层次化并行算法,并在曙光5000A机群上与传统的混合算法作了性能方面的比较.结果表明,该层次化混合并行算法具有更好的扩展性和加速比. 相似文献

3.

基于SMP集群系统的MPI—OpenMP混合并行FDTD算法研究

朱良杰宋祖勋刘真《现代电子技术》2011,34(14):107-110

针对基于MPI的传统FDTD并行算法存在的缺点,提出了优化的FDTD两级化并行算法。结合MPI和OpenMP编程模型的特点,实现了基于SMP集群系统平台的MPI-OpenMP混合编程模型的两种并行FDTD算法。在实验室搭建的SMP集群系统平台上,通过对一金属长方体的散射问题分析,把混合编程算法同基于MPI的FDTD并行算法进行了比较。结果表明,混合并行算法具有更好的加速比和带宽利用率。相似文献

4.

电大均匀介质目标三维散射的并行多层快速多极子计算

下载免费PDF全文

邸瀚漪杨明林盛新庆《电波科学学报》2016,31(4):695-699

实现了计算电大均匀介质体散射问题的高效混合并行混合场积分方程(Electric and Magnetic Current Combined-Field Integral Equation, JMCFIE)求解, 在单纯消息传递接口(Message Passing Interface, MPI)并行基础上采用共享存储并行编程(Open Multi-Processing, OpenMP)进一步提升性能.该混合MPI与OpenMP的并行多层快速多极子技术通过灵活的进程和线程策略, 提升了负载平衡和可扩展性.数值实验展示了此混合MPI与OpenMP的并行多层快速多极子技术的计算能力, 计算了不同尺寸的电大目标体(包含一个半径120 m、1.1亿未知数目的介质球). 相似文献

5.

一种基于大同步并行编程模式的N体问题的优化实现

祝永志王喜燕《电子技术》2015,(2)

文章基于多核机群系统对并行编程模型进行了深入研究,实现了多层次并行体系结构的OpenMP/MPI混合编程模型的设计.在以SMP机群系统为背景的情况下,实现其节点间和节点内的分层,运用多层次的并行编程模型进行实验与分析.同时对多层次并行编程模型的性能进行深入的研究,提出了一种大同步混合设计新思路.设计了N-Body问题的大同步优化并行算法,并在曙光TC 5000A机群上与传统的并行算法作了性能方面的比较.通过理论研究并结合大量的实验分析统计,得到了多核机群的混合并行编程模型的性能优化的诸多结论. 相似文献

6.

基于MPI＋OpenMP混合编程模型的水声传播并行算法

张林笪良龙范培勤《微电子学与计算机》2011,28(8)

声波是目前唯一能够在海水介质中进行远距离传播的有效载体,因此水下声传播成为海洋声学研究的主要内容之一,对现代声纳的设计和使用具有重要意义.宽带声传播、浅海地声反演、匹配场定位、水下环境仿真等技术的发展对水声传播提出了越来越高的要求,如何充分利用声传播模型和现代计算机技术实现声传播的快速计算已成为水声技术的一个重要研究方向.针对曙光TC4000L高性能机群系统计算节点多核处理器的组成特点,采用MPI＋OpenMP并行混合编程模型实现了射线-简正波-抛物方程模型并行算法.测试分析结果表明,设计的并行算法具有较高的并行计算效率. 相似文献

7.

集群系统中基于MPI的并行GMRES(m)计算通信的研究及应用 总被引：1，自引：1，他引：0

杨爱民刘韧赵广华崔玉环《微电子学与计算机》2009,26(9)

针对求解大型稠密线性方程组的GMRES(m)算法的内在并行性,应用可移植消息传递标准MPI的集群通信机制在分布式存储并行系统上,设计了一种粗粒度、低通信开销的并行算法,并且应用于边界元求解的大型弹性问题的计算中.通过与串行算法进行比较,设计的并行算法具有较高的计算精度和计算效率. 相似文献

8.

MPI+OpenMP混合编程在三维电磁辐射计算中的应用

唐龙何冰张武《微电子学与计算机》2014,(8)

在多核CPU集群环境下,采用MPI+OpenMP混合算法,实现了FDTD算法电磁场点源辐射数值模拟计算,有效解决了传统的FDTD算法在大数据电磁辐射计算中空间和时间上的不足的问题,并运用区域分割、子区域数据通讯、合并嵌套等方法提高程序的并行性,最后对计算结果进行正确性验证.通过在上海大学高性能集群上与MPI算法进行性能对比,结果表明,利用MPI+OpenMP获得了较高的加速比,并有效地节省计算资源,加快计算速度. 相似文献

9.

基于OpenMP的电磁场FDTD多核并行程序设计

吕忠亭张玉强崔巍《现代电子技术》2013,(23):168-170

探讨了基于OpenMP的电磁场FDTD多核并行程序设计的方法,以期实现该方法在更复杂的算法中应用具有更理想的性能提升。针对一个一维电磁场FDTD算法问题,对其计算方法与过程做了简单描述。在Fortran语言环境中,采用OpenMP＋~粒度并行的方式实现了并行化,即只对循环部分进行并行计算,并将该并行方法在一个三维瞬态场电偶极子辐射FDTD程序中进行了验证。该并行算法取得了较其他并行FDTD算法更快的加速比和更高的效率。结果表明基于OpenMP的电磁场FDTD并行算法具有非常好的加速比和效率。相似文献

10.

一种分布式编码算法在云存储中的应用

李琳琳王庆超姚超田野《微电子学与计算机》2014,(2)

在云存储中为保证数据的完整可靠,必须采用冗余备份的方式存储数据.为了更好的发挥存储节点的作用,在分析现有的冗余方式基础上提出了一种分布式编码算法.在新的冗余方案中编解码操作在各存储节点并行完成,而不需要将整个数据文件集中进行编解码,充分发挥了分布式系统优势.通过实验仿真验证了方法的可行性,大大提高了数据存储速率,提高了数据可用性,改善了系统性能. 相似文献

11.

一种输入输出排队交换机中分布式分组调度方法的研究

涂晓东李乐民《电子与信息学报》2003,25(4):515-521

针对采用共享缓存(shared memory)做为交换机构(switching fabric)的输入输出排队交换机,该文给出了一个分布式分组调度方法DHIOS(Distriduted Hierarchical Ingress and OutputScheduling)并做了详细的仿真。表明DHIOS可以支持变长分组,能够确保业务流的QoS,性能优良。相似文献

12.

面向任务的TBB多核集群混合并行编程模型

顾慧郑晓薇张建强吴华平《微电子学与计算机》2011,28(2):91-93,97

构建了一种适用于多核集群的混合并行编程模型.该模型融合了共享内存的面向任务的TBB编程和基于消息传递的MPI编程两种模式.结合两者的优势,实现进程到处理节点和进程内线程到处理器核的两级并行.相对于单一编程方式下的程序性能,采用这种混合并行编程模型的算法不但可以减少程序执行时间,获得更好的加速比和执行效率,而且明显地提高了集群性能. 相似文献

13.

MPI系统中共享内存通信技术研究 总被引：1，自引：0，他引：1

张洋卢宇彤蒋艳凰《现代电子技术》2010,33(19):179-182

MPI是消息传递并行程序设计接口,目前已经成为主流的并行编程模式。多核处理系统的出现,使得高性能计算更加关注节点内的进程通信性能。介绍多种节点内通信协议,以及两种MPI实现（OpenMPI和MPICH2）的结构,并对其中基于共享内存的消息传递功能采用的通信协议进行了研究,最后对两者的点点通信性能测试结果进行了比较和分析同时提出了优化策略。相似文献

14.

Protocoles simples pour ľimplémentation répartie des sémaphores

Michel Raynal 《电信纪事》1993,48(5-6):260-267

Advent of distributed memory parallel machines has made feasible implementation of the shared virtual memory concept in a distributed context. This paper presents a complementary aspect of such an approach, namely protocols that implement a basic centralized synchronization tool: the semaphore. Provided with implementations of shared virtual memory and semaphore concepts, a programmer can use the very classical programming model based on processes and shared variables, and then execute her program either on a shared memory multiprocessor or on a distributed memory parallel machine. 相似文献

15.

Producer-consumer communication in distributed shared memorymultiprocessors

Byrd G.T. Flynn M.J. 《Proceedings of the IEEE. Institute of Electrical and Electronics Engineers》1999,87(3):456-466

The shared memory abstraction supported by hardware based distributed shared memory (DSM) multiprocessors is an inherently consumer driven means of communication. When a process requires data, it retrieves them from the global shared memory. In distributed cache coherent systems, the data may reside in a remote memory module or in the producer's cache. Producer initiated mechanisms reduce communication latency by sending data to the consumer as soon as they are produced. We classify producer initiated mechanisms as implicit or explicit, according to whether the producer must know the identity of the consumer when data are transmitted. Explicit schemes include data forwarding and message passing. Implicit schemes include update based coherence, selective updates, and cache based locks. Several of these mechanisms are evaluated for performance and sensitivity to network parameters, using a common simulated architecture and a set of application kernel benchmarks. StreamLine, a cache based message passing mechanism, provides the best performance on the benchmarks with regular communication patterns. Forwarding write and cache based locks are also among the best performing producer initiated mechanisms. Consumer initiated prefetch, however, has good average performance and is the least expensive to implement 相似文献

16.

Performance analysis of multicast replication mechanism in shared-memory switch with speedup

WangWeizhang GeNing FengChongxi 《电子科学学刊(英文版)》2004,21(3):198-205

A multicast replication algorithm is proposed for shared memory switches. It uses a dedicated FIFO to multicast by replicating cells at receiver and the FIFO is operating with shared memory in parallel. Speedup is used to promote loss and delay performance. A new queueing analytical model is developed based on a sub-timeslot approach. The system performance in terms of cell loss and delay is analyzed and verified by simulation. 相似文献

17.

Thread migration and communication minimization in DSM systems 总被引：1，自引：0，他引：1

Thitikamol K. Keleher P. 《Proceedings of the IEEE. Institute of Electrical and Electronics Engineers》1999,87(3):487-497

Networks of workstations are characterized by dynamic resource capacities. Such environments can only be efficiently exploited by applications that are dynamically reconfigurable. The paper explores mechanisms and policies that enable online reconfiguration of shared memory applications through thread migration. We describe the design and preliminary performance of a distributed shared memory (DSM) system that performs online remappings of threads to nodes based on sharing behavior. Our system obtains complete sharing information through a novel correlation tracking phase that avoids the thread thrashing that characterizes previous approaches. This information is used to evaluate the communication required by a given thread mapping and to predict the resulting performance 相似文献

18.

Cache-coherent distributed shared memory: perspectives on itsdevelopment and future challenges

Hennessy J. Heinrich M. Gupta A. 《Proceedings of the IEEE. Institute of Electrical and Electronics Engineers》1999,87(3):418-429

Distributed shared memory is an architectural approach that allows multiprocessors to support a single shared address space that is implemented with physically distributed memories. Hardware-supported distributed shared memory is becoming the dominant approach for building multiprocessors with moderate to large numbers of processors. Cache coherence allows such architectures to use caching to take advantage of locality in applications without changing the programmer's model of memory. We review the key developments that led to the creation of cache-coherent distributed shared memory and describe the Stanford DASH multiprocessor, the first working implementation of hardware-supported scalable cache coherence. We then provide a perspective on such architectures and discuss important remaining technical challenges 相似文献