首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 525 毫秒
1.
针对设计CLUMPS上的机群通信协议方面的问题进行了深入的研究,给出了一个基于SMP结点的机群系统的多重通信协议结构及其协议模型、协议通信策略实现。最后对节点内部多重协议通信和传统的TCP/IP协议通信进行了比较,并给出了实验数据。多重通信协议系统能够充分地利用SMP结点的特性,对于不同互连结构的计算单元间采用不同的通信模式,从而有效地提高系统通信硬件的利用率,提高系统的总体性能。  相似文献   

2.
一种同构机群系统中的处理机分配算法   总被引:5,自引:0,他引:5  
机群系统的分布式计算环境为并行处理技术带来了新的研究与应用问题,正成为并行计算的热点问题.如何合理、有效地将并行任务划分到机群系统的结点上,将直接影响系统的执行性能.本文分析影响系统执行效率的执行开销因素,同时提出一个启发式的处理机分配算法.  相似文献   

3.
机群系统中,结点机差异及动态运行环境是导致系统运行状态不均衡的主要原因.提出了结点机差异及负载测量指标和方法.以SPMD编程模型中的单一并行任务为测试程序,在机群系统性能测试、分析的基础上,进行运行时间相同原则基础上的静态任务分配;在应用运行过程中,通过对并发线程运行状态测试,用已完成任务的高速设备重复执行低速机器未完成任务,选取最快运行结果.从而缩小不均衡因素的影响.以空间划分大规模Monte Carlo仿真问题为例进行算法实际测试,取得良好运行效果.  相似文献   

4.
在异构多核机群系统上利用数据任务块的动态调度策略和全锁定技术,给出一种面向数据密集型应用的结点内主存和可用的共享二级缓存大小中动态调度数据块的多进程级和多线程级并行编程机制,给出了优化数据密集型应用并行程序性能的策略和技术。在多核计算机组成的异构机群上并行求解随机序列多关键字查找的实验结果表明,所给出的多核并行程序设计机制和性能优化方法可行和高效。  相似文献   

5.
异构机群系统中的最优处理机分配算法   总被引:6,自引:0,他引:6  
在异构机群系统的并行计算中,处理机结点的划分及并行子任务在处理机上的映射将直接影响到应用程序并行计算的性能。本论文将通过对影响并行计算性能的主要参数的分析,提出一个基于人工智能A算法的最优处理机分配算法,为高性能的异构机群系统并行计算提供理论支持。  相似文献   

6.
随着各领域需要处理的数据量越来越大,数据密集型应用也变得越来越被重视.该文提出一种包含数据访存层次和访存冲突等信息的新并行程序执行模型PSRAM(h).针对数据密集型应用以访存为主的特点,PSRAM(h)模型将程序执行时间简化为访存时间,通过分析各程序子段的访存层次和数量来预测串行程序的执行时间,进而通过使用各线程执行时间的最大值来预测并行程序的执行时间.使用PSRAM(h)模型下对最典型的数据密集型应用矩阵向量乘进行分析,在龙芯3A处理器和Intel Xeon E5520处理器两个平台上的测试结果表明,PSRAM(h)模型分析结果与实测结果大部分情况下误差小于20%.由此可见,针对数据密集型应用,PSRAM(h)不但可以给出程序执行时间的下限,还可以有效的预测程序的执行时间.  相似文献   

7.
伴随大数据计算时代的到来,片上多核处理器为提高多线程程序服务器吞吐率发挥巨大作用,同时其内存系统的访问延迟越来越影响系统性能.目前,路径驱动(trace-driven)仿真方法比执行驱动(execution-driven)运行速度快,被内存系统研究者广泛采用.但是路径驱动在仿真并发线程时,会同时导致宏观和微观的访存错位.而实际多线程程序运行过程中,不会发生这种访存错位行为.通过理论分析和计算,访存错位引起路径驱动的仿真结果存在明显偏差.针对上述问题,提出了一种方法来避免路径驱动仿真发生宏观和微观访存错位,精确回放采集阶段的多线程程序行为.实验数据显示,在避免宏观访存trace错位后,多线程程序的多个仿真指标出现最高10.22%的变化;对于部分访存密集型的多线程程序,避免微观访存trace错位可以使算数平均IPC出现大于50%的变化.为研究交互线程的内存系统行为提供一种更加准确的路径驱动方法.  相似文献   

8.
针对关键应用对信息处理能力提出的性能要求以及国产化需求,在分析龙芯3A处理器架构特点的基础上,设计了基于NUMA并行处理架构的龙芯3A高性能处理模块,并对抗恶劣环境的关键问题进行了分析和设计,解决了散热、电源监控及供电优化、启动速度等问题。通过测试验证,性能可以满足关键应用对信息处理能力的要求,从而有效解决了龙芯3A访存能力有限的问题。同时对SMP和NUMA架构下,龙芯3A处理器CPU数量的增加对访存性能的提升的关系进行了探讨。  相似文献   

9.
高速通信网卡中PCI接口的研究与实现   总被引:1,自引:1,他引:0  
机群系统以其优异的性价比正被应用于越来越多的场合。文中分析了机群系统中高速通信网卡对PCI接口的要求,采用紧凑设计思想,将网卡的功能逻辑与PCI接口实现在一个FPGA芯片中。该PCI接口可分别以主模式和从模式进行工作。应用于微机与SMP机中,性能良好。  相似文献   

10.
基于维修时间约束的机群系统可用度的仿真算法   总被引:2,自引:0,他引:2  
在描述机群系统运行状态的基础上,通过对各结点间结构功能关系的量化处理,给出了描述机群系统中各结点随机变量的概率分布及其抽样方法;然后,依据对故障发生状态和修复状态的识别,确定出各结点的工作状态,从而建立了可用性仿真逻辑关系,得到了机群系统可用度的估计值,为机群系统可用性分析与设计提供了定量依据。  相似文献   

11.
一种基于内存服务的内存共享网格系统   总被引:1,自引:0,他引:1  
褚瑞  肖侬  卢锡城 《计算机学报》2006,29(7):1225-1233
内存密集型应用对运行环境的物理内存要求严格,在物理内存不足时将会引发大量磁盘IO,降低系统性能.传统的网络内存致力于在集群内部通过共享空闲节点的物理内存解决该问题,但受集群负载和内部网络影响较大.通过结合网络内存和服务计算、网格计算等技术,提出一种基于内存服务的内存共享网格系统——内存网格,并分析和讨论了实现内存服务的关键技术和算法.内存网格弥补了网络内存的不足,扩展了网格计算的应用范围.通过基于真实应用运行状态的模拟,证明了内存网格与网络内存相比具有性能的提高.  相似文献   

12.
基于SMP集群的混合并行编程模型研究   总被引:9,自引:3,他引:6       下载免费PDF全文
提出一种适用于SMP集群的混合MPI+OpenMP并行编程模型。该模型贴近于SMP集群的体系结构且综合了消息传递和共享内存2种编程模型的优势,能获得较好的性能。讨论该混合模型的实现机制以及MPI消息传递模型的特点。实验结果表明,在一定条件下,该混合并行编程模型是SMP集群的最优选择。  相似文献   

13.
SMPCluster:如何开发两级并行   总被引:3,自引:1,他引:3       下载免费PDF全文
本文由基础的Linux操作系统入手,考察在一个SMP系统内部的两种不同的并行实现机制:代表共享存储模型的线程模型(和OpenMP模型)和代表消息传递模型的MPI模型。然后,通过分析应当如何结合节点和节点内两级并行得出:从效率和易用性的综合考虑,在LinuxSMP Cluster上应当直接使用利用共享内存进行通信的MPI进行编程。  相似文献   

14.
Clusters of SMPs are hybrid-parallel architectures that combine the main concepts of distributed-memory and shared-memory parallel machines. Although SMP clusters are widely used in the high performance computing community, there exists no single programming paradigm that allows exploiting the hierarchical structure of these machines. Most parallel applications deployed on SMP clusters are based on MPI, the standard API for distributed-memory parallel programming, and thus may miss a number of optimization opportunities offered by the shared memory available within SMP nodes. In this paper we present extensions to the data parallel programming language HPF and associated compilation techniques for optimizing HPF programs on clusters of SMPs. The proposed extensions enable programmers to control key aspects of distributed-memory and shared-memory parallelization at a high-level of abstraction. Based on these language extensions, a compiler can adopt a hybrid parallelization strategy which closely reflects the hierarchical structure of SMP clusters by automatically exploiting shared-memory parallelism based on OpenMP within cluster nodes and distributed-memory parallelism utilizing MPI across nodes. We describe the implementation of these features in the VFC compiler and present experimental results which show the effectiveness of these techniques.  相似文献   

15.
This paper presents a Distributed Shared Array runtime system to support Java-compliant multithreaded programming on clusters of symmetric multiprocessors (SMPs). As a hybrid of message passing and shared address space programming models, the DSA programming model allows programmers to explicitly control data distribution so as to take advantage of the deep memory hierarchy, while relieving them from error-prone orchestration of communication and synchronization at run-time. The DSA system is developed as an integral component of mobility support middleware for grid computing so that DSA-based virtual machines can be reconfigured to adapt to the varying resource supplies or demand over the course of a computation. The DSA runtime system also features a directory-based cache coherence protocol in support of replication of user-defined sharing granularity and a communication proxy mechanism for reducing network contention. We demonstrate the programmability of the model in a number of parallel applications and evaluate its performance on a cluster of SMP servers, in particular, the impact of the coherence granularity.  相似文献   

16.
Commercial transaction processing applications are an important workload running on symmetric multiprocessor systems (SMPs). They differ dramatically from scientific, numeric-intensive, and engineering applications because they are I/O bound, and they contain more system software activities. Most SMP servers available in the market have been designed and optimized for scientific and engineering workloads. A major challenge of studying architectural effects on the performance of a commercial workload is the lack of easy access to large-scale and complex database engines running on a multiprocessor system with powerful I/O facilities. Experiments involving case studies have been shown to be highly time-consuming and expensive. In this paper, we investigate the feasibility of using queuing network models with the support of simulation to study the SMP architectural impacts on the performance of commercial workloads. We use the commercial benchmark TPC-C as the workload. A bus-based SMP machine is used as the target platform. Queueing network modeling is employed to characterize the TPC-C workload on the SMP. The system components such as processors, memory, the memory bus, I/O buses, and disks are modeled as service centers, and their effects on performance are analyzed. Simulations are conducted as well to collect the workload-specific parameters (model parameterization) and to verify the accuracy of the model. Our studies find that among disk-related parameters, the disk rotation latency affects the performance of TPC-C most significantly. Among I/O buses and number of disks, the number of I/O buses has the deepest impact on performance. This study also demonstrates that our modeling approach is feasible, cost-effective, and accurate for evaluating the performance of commercial workloads on SMPs, and it is complementary to the measurement-based experimental approaches.  相似文献   

17.
首先,基于并行计算模型HPM,分析了多机机群系统的体系结构特点,从并行性和局部性(存储与通信特性)两方面分析影响并行应用软件性能的主要因素,讨论应用软件并行与优化的相关问题;分析了纯MPI和MPI SMP(或OMP)制导两种编程模式在性能上的优点与不足。然后,讨论了在GoSMFs系统上对并行应用软件进行优化的方法。最后,对两种不同的通信模式(循环交换、边界交换)在CoSMPs;系统上的性能进行讨论,并在多机机群系统的实例——DW3000超级服务器上进行优化;通过计算实例——矩阵乘法和解偏微分方程的五点格式算法加以验证,实算结果和理论分析一致。  相似文献   

18.
引入一种NUMA多处理器原型系统,分析该系统上的操作系统对物理内存管理的特点。基于该系统设计和实现了一个全局共享内存系统,使操作系统充分利用整个系统上的物理内存,减少应用程序的执行时间。实验结果表明,该系统能够更好地支持存储密集型 应用。  相似文献   

19.
Traditional relational database systems handle data by dividing their memory into sections such as a buffer cache and working memory, assigning a memory budget to each section to efficiently manage a limited amount of overall memory. They also assign memory budgets to memory-intensive operators such as sorts and joins and control the allocation of memory to these operators; each memory-intensive operator attempts to maximize its memory usage to reduce disk I/O cost. Implementing such memory-intensive operators requires a careful design and application of appropriate algorithms that properly utilize memory. Today's Big Data management systems need the ability to handle large amounts of data similarly, as it is unrealistic to assume that truly big data will fit into memory. In this article, we share our memory management experiences in Apache AsterixDB, an open-source Big Data management software platform that scales out horizontally on shared-nothing commodity computing clusters. We describe the implementation of AsterixDB's memory-intensive operators and their designs related to memory management. We also discuss memory management at the global (cluster) level. We conducted an experimental study using several synthetic and real datasets to explore the impact of this work. We believe that future Big Data management system builders can benefit from these experiences.  相似文献   

20.
对采用多核处理器作为SMP集群系统的计算节点的系统上的一种混合编程模型-MPI+OpenMP混合编程模型进行了深入的研究.建立了两个矩阵乘的混合并行算法,在多核集群平台上与纯MPI算法分别进行了实验,并进行了性能方面的比较.试验表明,混合编程具有更好的性能.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号