期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

胡晨骏王晓蔚《计算机技术与发展》2008,18(4):70-73

并行计算技术是计算机技术发展的重要方向之一.当前并行程序模型主要有消息传递模型和共享存储模型两种.随着处理器多核技术的发展,在一枚多核处理器中集成两个或多个完整的计算引擎(内核),并充分利用多核计算机的特性,发挥多核计算机的性能成为一个很重要的研究方向.介绍一种新的MPI实现机制,这种机制集成了共享存储模型和消息通信模型的优点,在节点内使用共享存储模型,在节点间使用消息传递模型,并且通过自动生成线程级的任务来获得更好的性能. 相似文献

2.

多核集群系统下的混合并行遗传算法研究

王竹荣巨涛马凡《计算机科学》2011,38(7):194-199

为应对传统遗传算法在处理大规模组合优化问题面临的进化速度缓慢,难以达到实时要求的严峻挑战,提出了一种在多核PC集群系统上实现“粗粒度一主从式”混合并行遗传算法的模型:通过把“粗粒度一主从式”并行遗传算法映射到多核PC集群上,结合消息传递和共享存储两种并行编程模型,在节点间使用消息传递模型(MPI),对应的遗传算法为粗粒度并行遗传算法,在节点内使用共享存储模型(OpcnMP),对应的遗传算法为主从式并行遗传算法,用MPI和OpenMP混合编程的方式以进程和线程两级并行在多核集群上实现具体的混合并行遗传算法。理论分析和实验结果表明,提出的实现模型有较好的性能,可大大改进传统遗传算法的缺陷。为利用并行遗传算法在普通多核PC集群上处理大规模组合优化问题提出了一种有效、可行的解决方案。相似文献

3.

时钟共享多线程处理器通信机制的设计与实现

《电子技术应用》2016,(3):42-46

多核多线程处理器~([1])是并行技术的一个发展方向,基于多核多线程处理器,提出了一种时钟共享多线程处理器。该处理器有近邻通信和线程间通信两种通信机制,近邻通信采用近邻共享FIFO来传递信息,线程间通信通过线程间共享存储来传递信息,这样可以提高处理器的资源利用率和并行执行能力。相似文献

4.

集成CPU-GPU架构上的列存储连接优化技术研究

丁祥武李子通《计算机科学》2016,43(11):265-271, 308

集成多核CPU-GPU架构已经成为计算机处理器芯片的发展方向。利用这种架构的并行计算能力进行数据处理已经成为了数据库领域的研究热点。为了提高列存储系统的查询性能,首先改进了已有协处理机制中的负载分配策略,通过监测数据库系统CPU占用率,动态地为处理器提供合理的数据划分;然后,针对集成多核CPU-GPU架构上的数据预取机制,提出了一种确定预取数据大小的模型,同时,针对GPU访存的特点,进行了GPU访存优化;最后,使用OpenCL作为编程语言,实现了一种集成多核CPU-GPU架构上的列存储排序归并连接算法,并采用提出的方法对连接处理进行优化。实验证明,所提优化策略可以使列存储系统排序归并连接性能提升33%。相似文献

5.

一种多核处理器存储层次性能评估模型

郭建军戴葵王志英《计算机研究与发展》2009,46(Z1)

一种用于评估多核处理器存储层次性能的模型,使用排队论建模,求解速度快,可以在设计早期给出不同配置参数对处理器整体性能的影响,从而调整存储层次结构,优化设计. 相似文献

6.

基于MPI与OpenMP混合并行计算技术的研究

李苏平刘羽刘彦宇《软件导刊》2010,(3)

针对多核机群系统的硬件体系结构特点,提出了节点间MPI消息传递、节点内部OpenMP共享存储的混合并行编程技术。该编程模型结合了两者的优点,更为有效地利用了多核机群的硬件资源。建立了单层混合并行的Jacobi求对称矩阵特征值算法。实验结果表明,与纯MPI算法相比,混合并行算法能够取得更好的加速比。相似文献

7.

PC机群上共享存储与消息传递的比较 总被引：7，自引：0，他引：7

下载免费PDF全文

章隆兵吴少刚蔡飞胡伟武《软件学报》2004,15(6):842-849

共享存储和消息传递是目前两种主流的并行编程模型.一般认为,消息传递的可编程性不及共享存储友好.OpenMP是目前共享存储编程的实际工业标准.机群OpenMP系统在机群上提供了OpenMP编程环境,具有易编程和可扩展的特点,但是其性能如何一直是关注的热点.以机群OpenMP系统OpenMP/JIAJIA和典型的消息传递系相似文献

8.

基于MCAPI的多核软件开发方法

《电子技术应用》2016,(1):31-33

提出了一种基于多核通信应用程序接口(MCAPI)标准的多核软件开发方法,该标准提供了基于消息传递的API,适用于核间通信,大大提高了应用程序在多核处理器上的可移植性。采用poly-platform软件工具进行多核软件开发,首先建立拓扑结构,然后定义节点工程,完成存储分配等工作,再利用MCAPI模板完成节点间通信,最后编制各个节点的应用程序。该软件开发流程独立于厂商、器件和操作系统,可将应用程序快速灵活地映射到不同的同构和异构多核架构上,大大提高了多核软件的开发效率。相似文献

9.

基于OpenMP的Kriging插值算法研究

陈欢谢健《计算机科学》2012,39(106):392-395

随着多核处理器的普及,并为了充分利用多核PC机的特性,计算机技术逐渐向多核架构及多核计算技术发展。为提高对湖南地区100mX 100m小网格气温插值的速度,采用以OpenMP为标准的基于共享存储的并行编程模型对Kriging插值算法进行改进。在不同核的多核PC机中,采用100mX 100m小网格和500mX 500m小网格地形数据对平均气温进行插值,不仅有效减少了插值时间和提高了算法的加速比,而且集成到业务系统中大大提升了系统的反应时间及性能。相似文献

10.

多核机群主节点并发发送数据的可分负载调度

钟诚蔡德霞杨锋《计算机研究与发展》2014,(6)

对于节点计算、通信与存储能力不同、节点由多个多核处理器(多个片上多处理器)组成且共享L3cache的机群系统,采取计算与传输重叠模式,提出了主节点以多进程方式并发发送数据给从节点的可分负载调度模型.该调度模型自适应节点具有不同的计算、通信和存储能力,动态计算、确定调度轮数和每轮调度分配给各从节点的负载块规模,以平衡各节点的计算负载、减少节点之间的通信开销,缩短任务调度长度.依据各节点中的L3cache,L2cache和L1cache的可用存储容量,提出了对节点主存中接收到的负载块进行多级缓存划分的数据分配方法,以确保分配给节点中各个多核处理器、各个内核的负载平衡.基于提出的多核机群节点间可分负载调度模型和节点内多级存储数据分配方法,设计实现了节点拥有多个多核处理器的异构机群上通信和存储高效的k-选择并行算法.在曙光TC5000A多核机群系统上,测试了主节点并行与串行发送数据给从节点的任务调度方式、各级缓存利用率、每个核心执行不同数目的线程对并行算法运行性能的影响.实验结果表明:基于主节点并发发送数据给从节点的调度模型设计的k-选择并行算法,其运行性能优于基于主节点串行发送数据给从节点的调度模型设计的k-选择并行算法;L3cache和L2cache利用率大小对算法运行性能影响较大;当L3cache,L2cache和L1cache利用率取其优化组合值、每个核心运行3个线程时,算法所需的运行时间最短. 相似文献

11.

基于SMP集群系统的并行编程模式研究与分析

宋伟宋玉《微机发展》2007,17(2):164-167

并行计算技术是计算机技术发展的重要方向之一,SMP与集群是当前主流的并行体系结构。当前并行程序设计方法主要采用基于消息传递模型的MPI和基于共享存储模型的OpenMP,两种编程模式各有特点和适用范围。对SMP集群以及MPI和OpenMP的特点进行了分析,介绍了在SMP集群系统中利用MPI和OpenMP混合编程的可行性方法。相似文献

12.

A two-level parallelization method for distributed hydrological models

《Environmental Modelling & Software》2016

This paper proposes a scalable two-level parallelization method for distributed hydrological models that can use parallelizability at both the sub-basin level and the basic simulation-unit level (e.g., grid cell) simultaneously. This approach first uses the message-passing programming model to dispatch parallel tasks at the sub-basin level to different nodes with multi-core CPUs in the cluster. Each node is responsible for some of the sub-basins. Parallel tasks for each sub-basin at the basic simulation-unit level are then dispatched to multiple cores within each node using the shared-memory programming model. A grid-based distributed hydrological model was parallelized to demonstrate the performance of the proposed method, which was tested in different scenarios (e.g., different data volume, different numbers of sub-basins). Results show that the proposed two-level parallelization method had better scalability than the parallel computation at sub-basin level alone, and the parallel performance increased with data volume and the number of sub-basins. 相似文献

13.

基于多核处理器并发计算软件构架设计与实现 总被引：3，自引：2，他引：1

汪少敏赵猛朱振博王艳琦《计算机科学》2008,35(7):283-285

目前在诸多IT应用领域中,对处理器芯片的实时并发处理能力的要求越来越高,促使多核处理器芯片以及以多核处理器为核心的高性能应用系统迅猛发展.本文提出的基于异构多核处理器系统在高性能并发处理应用中的三层软件构架,充分利用了异构处理器的多核结构,为并发处理应用进行加速;同时,该构架大大简化了异构多核平台的应用开发编程.该软件构架的有效性在基于Cell处理器平台的面向电信应用的语音会议原型系统中得到了初步试验验证. 相似文献

14.

An Evaluation of Architectural Platforms for Parallel Navier-Stokes Computations

Jayasimha D. N. Hayder M. E. Pillay S. K. 《The Journal of supercomputing》1997,11(1):41-60

We study the computational, communication, and scalability characteristics of a computational fluid dynamics application, which solves the time-accurate flow field of a jet using the compressible Navier-Stokes equations, on a variety of parallel architectural platforms. The platforms chosen for this study are a cluster of workstations (the LACE experimental testbed at NASA Lewis), a shared-memory multiprocessor (the CRAY Y-MP), and distributed-memory multiprocessors with different topologies (the IBM SP and the CRAY T3D). We investigate the impact of various networks connecting the cluster of workstations on the performance of the application and the overheads induced by popular message-passing libraries used for parallelization. The work also highlights the importance of matching the memory bandwidth to processor speed for good single processor performance. By studying the performance of an application on a variety of architectures, we are able to point out the strengths and weaknesses of each of the example computing platforms. This revised version was published online in June 2006 with corrections to the Cover Date. 相似文献

15.

Porting a global ocean model onto a shared-memory multiprocessor: Observations and guidelines

Richard J. Procassini Scott R. Whitman William P. Dannevik 《The Journal of supercomputing》1993,7(3):287-321

A three-dimensional global ocean circulation model has been modified to run on the BBN TC2000 multiple instruction stream/multiple data stream (MIMD) parallel computer. Two shared-memory parallel programming models have been used to implement the global ocean model on the TC2000: the TCF (TC2000 Fortran) fork-join model and the PFP (Parallel Fortran Preprocessor) split-join model. The method chosen for the parallelization of this global ocean model on a shared-memory MIMD machine is discussed. The performance of each version of the code has been measured by varying the processor count for a fixed-resolution test case. The statically scheduled PFP version of the code achieves a higher parallel computing efficiency than does the dynamically scheduled TCF version of the code. The observed differences in the performance of the TCF and PFP versions of the code are discussed. The parallel computing performance of the shared-memory implementation of the global ocean model is limited by several factors, most notably load imbalance and network contention. The experience gained while porting this large, real world application onto a shared-memory multiprocessor is also presented to provide insight to the reader who may be contemplating such an undertaking. 相似文献

16.

A novel strategy for building interoperable MPI environment in heterogeneous high performance systems

Francisco Isidro Massetto Liria Matsumoto Sato Kuan-Ching Li 《The Journal of supercomputing》2012,60(1):87-116

Breakthrough advances in microprocessor technology and efficient power management have altered the course of development of processors with the emergence of multi-core processor technology, in order to bring higher level of processing. The utilization of many-core technology has boosted computing power provided by cluster of workstations or SMPs, providing large computational power at an affordable cost using solely commodity components. Different implementations of message-passing libraries and system softwares (including Operating Systems) are installed in such cluster and multi-cluster computing systems. In order to guarantee correct execution of message-passing parallel applications in a computing environment other than that originally the parallel application was developed, review of the application code is needed. In this paper, a hybrid communication interfacing strategy is proposed, to execute a parallel application in a group of computing nodes belonging to different clusters or multi-clusters (computing systems may be running different operating systems and MPI implementations), interconnected with public or private IP addresses, and responding interchangeably to user execution requests. Experimental results demonstrate the feasibility of this proposed strategy and its effectiveness, through the execution of benchmarking parallel applications. 相似文献

17.

异构多核处理器体系结构设计研究 总被引：2，自引：0，他引：2

陈芳园张冬松王志英《计算机工程与科学》2011,33(12):27-36

多核技术成为当今处理器发展的重要方向,异构多核处理器由于可将不同类型的计算任务分配到不同类型的处理器核上并行处理,从而为不同需求的应用提供更加灵活、高效的处理机制而成为当今研究的热点.本文从体系结构的角度探讨了异构多核处理器设计中的关键点,从内核结构、互连方式、存储系统、操作系统支持、测试与验证、动态电压调节等方面分析... 相似文献

18.

Efficient compiler and run-time support for parallel irregular reductions

Hwansoo Han Chau-Wen Tseng 《Parallel Computing》2000,26(13-14)

Many scientific applications are comprised of irregular reductions on large data sets. In shared-memory parallel programs, these irregular reductions are typically computed in parallel using replicated buffers, then combined using synchronization. We develop L W , a new technique which partitions irregular reductions so that each processor computes values only for locally assigned data, eliminating the need for buffers or synchronized writes. Computation is replicated if its results are needed on multiple processors. We experimentally evaluate its performance for three irregular codes on a software DSM running on a distributed-memory multiprocessor and two shared-memory multiprocessors while varying connectivity, locality, and adaptivity. Results show L W improves performance significantly compared to using replicated buffers, and can match or exceed explicit message-passing gather/scatter for applications with low locality or high adaptivity. 相似文献

19.

多核集群系统上的混合编程模型研究

张军万剑怡《计算机与现代化》2009,(5)

对采用多核处理器作为SMP集群系统的计算节点的系统上的一种混合编程模型-MPI+OpenMP混合编程模型进行了深入的研究.建立了两个矩阵乘的混合并行算法,在多核集群平台上与纯MPI算法分别进行了实验,并进行了性能方面的比较.试验表明,混合编程具有更好的性能. 相似文献