首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
传统的网络文件系统难以满足高性能计算系统的I/O 需求,并行网络文件系统——PNFS可以有效地解决传统网络文件系统在可扩展性、可用性和性能上存在的问题。首先对PNFS的体系结构进行了设计,实现了元数据服务器与存储服务器的分离,消除了由于集中服务器结构引发的I/O瓶颈问题。然后,对PNFS的原型系统进行了性能测试,并与相同环境下NFS的测试结果进行比较与分析,结果表明PNFS能够为客户端提供并行访问文件数据的能力,有着较高的I/O读写带宽和较低的访问延迟,同时实现了客户端I/O带宽与存储服务器规模之间的线性可扩展关系,能较好地满足高性能计算中的I/O需求。  相似文献   

2.
In this paper we propose a new simulation platform called SIMCAN, for analyzing parallel and distributed systems. This platform is aimed to test parallel and distributed architectures and applications. The main characteristics of SIMCAN are flexibility, accuracy, performance, and scalability. Thence, the proposed platform has a modular design that eases the integration of different basic systems on a single architecture. Its design follows a hierarchical schema that includes simple modules, basic systems (computing, memory managing, I/O, and networking), physical components (nodes, switches, …), and aggregations of components. New modules may also be incorporated as well to include new strategies and components. Also, a graphical configuration tool has been developed to help untrained users with the task of modelling new architectures. Finally, a validation process and some evaluation tests have been performed to evaluate the SIMCAN platform.  相似文献   

3.
Virtualization is a common strategy for improving the utilization of existing computing resources, particularly within data centers. However, its use for high performance computing (HPC) applications is currently limited despite its potential for both improving resource utilization as well as providing resource guarantees to its users. In this article, we systematically evaluate three major virtual machine implementations for computationally intensive HPC applications using various standard benchmarks. Using VMWare Server, Xen, and OpenVZ, we examine the suitability of full virtualization (VMWare), paravirtualization (Xen), and operating system-level virtualization (OpenVZ) in terms of network utilization, SMP performance, file system performance, and MPI scalability. We show that the operating system-level virtualization provided by OpenVZ provides the best overall performance, particularly for MPI scalability. With the knowledge gained by our VM evaluation, we extend OpenVZ to include support for checkpointing and fault-tolerance for MPI-based virtual server distributed computing.  相似文献   

4.
蒋筱斌  熊轶翔  张珩  武延军  赵琛 《软件学报》2023,34(4):1977-1996
现阶段,随着数据规模扩大化和结构多样化的趋势日益凸现,如何利用现代链路内链的异构多协处理器为大规模数据处理提供实时、可靠的并行运行时环境,已经成为高性能以及数据库领域的研究热点.利用多协处理器(GPU)设备的现代服务器(multi-GPU server)硬件架构环境,已经成为分析大规模、非规则性图数据的首选高性能平台.现有研究工作基于Multi-GPU服务器架构设计的图计算系统和算法(如广度优先遍历和最短路径算法),整体性能已显著优于多核CPU计算环境.然而,这类图计算系统中,多GPU协处理器间的图分块数据传输性能受限于PCI-E总线带宽和局部延迟,导致通过增加GPU设备数量无法达到整体系统性能的类线性增长趋势,甚至会出现严重的时延抖动,进而已无法满足大规模图并行计算系统的高可扩展性要求.经过一系列基准实验验证发现,现有系统存在如下两类缺陷:(1)现代GPU设备间数据通路的硬件架构发展日益更新(如NVLink-V1,NVLink-V2),其链路带宽和延迟得到大幅改进,然而现有系统受限于PCI-E总线进行数据分块通信,无法充分利用现代GPU链路资源(包括链路拓扑、连通性和路由);(2)在...  相似文献   

5.
本文主要介绍了大规模油藏数值模拟并行计算技术在国内的研究进展,提供了精细油藏模拟在国产Beowulf系统上的计算实例和应用效果,给出了百万网格点规模的油藏应用算例在不同处理器规模下的数值模拟计算结果与性能分析,并实现了一个针对海量数据可视化的三维图、二维图、表格显示的后处理显示系统.  相似文献   

6.
基于集群的流媒体缓存代理服务器体系结构   总被引:1,自引:0,他引:1  
提出了基于集群的流媒体缓存代理服务器体系结构,并将基于内容的前端机负载平衡调度策略引入到流媒体缓存代理服务器的架构设计中,对系统设计原理及各组成模块结构和各模块间的消息通信机制进行了详细的分析阐述,主要采用C和C++语言在Linux操作系统下实现了原型系统,通过对原型系统进行测试,表明整个系统设计合理,性能突出,具有良好的稳定性和可扩展性。  相似文献   

7.
Although CAD tools have significantly assisted electronic system simulation, the system-level optoelectronics modeling field has lagged behind due to a lack of simulation methodologies and tools. Optisim, a system-level modeling and simulation methodology of optical interconnects for HPC systems, can provide computer architects, designers, and researchers with a highly optimized, efficient, and accurate discrete-event environment to test various HPC systems.  相似文献   

8.
楚国锋  陈麒  张鸾 《微机发展》2011,(10):66-69
旨在方便P2P研究人员对模拟器的了解、选择与使用,寻找一种性能优秀的P2P网络模拟器。采用比较法首先分析了模拟方法在P2P网络测试验证方面的优越性;然后分层次地介绍当前一些具有代表性的P2P模拟器,包括传统通用的网络模拟器、涛议专用的P2P模拟器、通用P2P模拟器和并行化通用P2P模拟器;最后从不同模拟器的适用范围,比较各种网络模拟器的优劣。从而得出P2P网络模拟器应根据实际需求从架构、实用性、延展性、统计性和底层网络模拟等五个方面的考虑选择,并从多方面分析了今后P2P模拟器的发展趋势。  相似文献   

9.
In conventional video-on-demand systems, video data are stored in a video server for delivery to multiple receivers over a communications network. The video server's hardware limits the maximum storage capacity as well as the maximum number of video sessions that can concurrently be delivered. Clearly, these limits will eventually be exceeded by the growing need for better video quality and larger user population. This paper studies a parallel video server architecture that exploits server parallelism to achieve incremental scalability. First, unlike data partition and replication, the architecture employs data striping at the server level to achieve fine-grain load balancing across multiple servers. Second, a client-pull service model is employed to eliminate the need for interserver synchronization. Third, an admission-scheduling algorithm is proposed to further control the instantaneous load at each server so that linear scalability can be achieved. This paper analyzes the performance of the architecture by deriving bounds for server service delay, client buffer requirement, prefetch delay, and scheduling delay. These performance metrics and design tradeoffs are further evaluated using numerical examples. Our results show that the proposed parallel video server architecture can be linearly scaled up to more concurrent users simply by adding more servers and redistributing the video data among the servers  相似文献   

10.
在众核处理器应用中,主要难点在于异构并行应用模式和负载均衡的策略,对于计算流体力学,需要针对相关应用设计相应的方案。我们针对湍流直接数值模拟中串行程序含有部分并行度较高的子程序或函数的特点,设计了一种新的并行计算模式,给出了一种异构平台优化方案,并在中科院超级计算系统"元"上进行了测试和分析,对领域内的典型算例进行了性能测试,着重讨论了不同规模下采用offload模式的CPU和MIC异构并行的扩展性能。  相似文献   

11.
Object-based parallel file systems have emerged as promising storage solutions for high-performance computing (HPC) systems. Despite the fact that object storage provides a flexible interface, scheduling highly concurrent I/O requests that access a large number of objects still remains as a challenging problem, especially in the case when stragglers (storage servers that are significantly slower than others) exist in the system. An efficient I/O scheduler needs to avoid possible stragglers to achieve low latency and high throughput. In this paper, we introduce a log-assisted straggler-aware I/O scheduling to mitigate the impact of storage server stragglers. The contribution of this study is threefold. First, we introduce a client-side, log-assisted, straggler-aware I/O scheduler architecture to tackle the storage straggler issue in HPC systems. Second, we present three scheduling algorithms that can make efficient decision for scheduling I/Os while avoiding stragglers based on such an architecture. Third, we evaluate the proposed I/O scheduler using simulations, and the simulation results have confirmed the promise of the newly introduced straggler-aware I/O scheduler.  相似文献   

12.
It is widely accepted that future HPC systems will be limited by their power consumption. Current HPC systems are built from commodity server processors, designed over years to achieve maximum performance, with energy efficiency being an after-thought. In this paper we advocate a different approach: building HPC systems from low-power embedded and mobile technology parts, over time designed for maximum energy efficiency, which now show promise for competitive performance.We introduce the architecture of Tibidabo, the first large-scale HPC cluster built from ARM multicore chips, and a detailed performance and energy efficiency evaluation. We present the lessons learned for the design and improvement in energy efficiency of future HPC systems based on such low-power cores. Based on our experience with the prototype, we perform simulations to show that a theoretical cluster of 16-core ARM Cortex-A15 chips would increase the energy efficiency of our cluster by 8.7×, reaching an energy efficiency of 1046 MFLOPS/W.  相似文献   

13.
Simulation is an important method to evaluate future computer systems. Currently microprocessor architecture has switched to parallel, but almost all simulators remained at sequential stage, and the advantages brought by multi-core or many-core processors cannot be utilized. This paper presents a parallel simulator engine (SimK) towards the prevalent SMP/CMP platform, aiming at large-scale fine-grained computer system simulation. In this paper, highly efficient synchronization, communication and buffer management policies used in SimK are introduced, and a novel lock-free scheduling mechanism that avoids using any atomic instructions is presented. To deal with the load fluctuation at light load case, a cooperated dynamic task migration scheme is proposed. Based on SimK, we have developed large-scale parallel simulators HppSim and HppNetSim, which simulate a full supercomputer system and its interconnection network respectively. Results show that HppSim and HppNetSim both gain sound speedup with multiple processors, and the best normalized speedup reaches 14.95X on a two-way quad-core server.  相似文献   

14.
该文介绍了基于网络的分布式机群并行仿真软件平台SIMNOWs的基本情况及开发。由主服务器、节点机构成基于局域网络的分布式机群并行仿真平台体系结构。作为整个系统的核心,服务器端管理系统根据两个守护进程而设计出节点管理系统和作业管理系统,采用了动态抢先式的动态负载平衡解决并行中的负载问题。  相似文献   

15.
仿真模型越来越复杂,受单机计算能力和存储容量的限制,模拟需要花费的时间也越来越长。PDES(Parallel Discrete Event Simulation)策略能够加快仿真程序的执行,因此一度成为研究热点。但是,并行仿真最终并没有在工业界得到广泛应用,其原因在于:并行仿真建模理论缺乏,并行仿真性能具有不可预测性,以及并行程序行为的不可预测性。本文在讨论模拟器并行化的一般方法基础上,给出了一个基于SSF的传感器网络并行仿真环境SensorSSF。SensorSSF设计遵循:可扩展性和简洁性。可扩展性保证CPU执行时间随求解问题的规模和仿真模型的复杂度线性增长;简洁性使得仿真应用人员无需了解太多并行程序设计知识,就可以编写出高效的仿真程序。实验结果表明,SensorSSF具有良好的可扩展性,同NS2相比具有较好的时间特性。  相似文献   

16.
为了缓解I/O瓶颈问题,可以从应用程序、可扩展算法、编译器和语言、运行时库、操作系统和体系结构六方面展开研究。其中,I/O体系结构是所有技术途径的关键支撑。当前并行I/O性能分析缺乏科学的理论模型为I/O体系结构设计提供理论依据。本文针对并行计算机系统的可扩展性问题,研究了I/O负载对并行计算机系统可扩展性的影响,建立了I/O受限的并行加速比性能模型,对目前大规模并行计算机系统中三种常用I/O体系结构的可扩展性进行了分析;以此为理论依据,提出了一种面向高性能计算的可扩展并行I/O系统结构。同时,还提出了几种有效降低I/O操作服务时间的策略,从而达到增强系统可扩展性的目的,为后续研究奠定了基础。  相似文献   

17.
一种多机器人系统仿真平台的框架结构研究   总被引:1,自引:0,他引:1  
陆波波  黄鸿 《计算机应用》2005,25(5):1029-1030
针对多机器人系统实验平台造价高,硬件容易老化、损坏等问题,提出一种仿真平台实现模拟的框架结构。仿真平台基于客户机/服务器结构,利用内核模块管理其他功能模块,引入插件概念构造传感器插件有效的实现了传感器的模拟。实验证明其易用性与实用性。  相似文献   

18.
作战指挥系统是战斗力生成的核心,如何构造测试平台来测试系统软件的功能、性能指标进而对系统效能进行评估,是必须研究和解决的问题.介绍了舰载作战指挥系统仿真测试平台的测试需求和设计思想,重点论述了测试平台的体系结构及其实现技术.分析了实验室测试和海上靶场测试各自的局限性,提出了将二者进行有机结合的解决思路.以雷达为例介绍了装备模拟器的实现原理,并给出了自然、电子干扰因素对雷达探测能力产生影响的技术方案.  相似文献   

19.
基于多虚空间多重映射技术的并行操作系统   总被引:3,自引:0,他引:3  
陈左宁  金怡濂 《软件学报》2001,12(10):1562-1568
高性能计算机系统的可扩展性是系统设计的一大难题,NUMA(non-uniformmemoryarchitecture)结构正是为了解决共享存储体系的可扩展性问题而提出来的.研究和实践表明,整机系统的可扩展性与操作系统的结构有着密切的关系.典型的多处理机操作系统通常采用两种结构,基于共享的单一核心结构以及基于消息的多核心结构.通过分析得出结论认为,这两种结构都不能很好地适应可扩展并行机尤其是NUMA结构并行机的需求.针对存在的问题,提出了新的结构设计思想:多虚空间多重映射与主动消息相结合.测试和运行结果显示,该结构成功地解决了系统的可扩展问题.  相似文献   

20.
The advent of unprecedentedly scalable yet energy hungry Exascale supercomputers poses a major challenge in sustaining a high performance-per-watt ratio. With I/O management acquiring a crucial role in supporting scientific simulations, various I/O management approaches have been proposed to achieve high performance and scalability. However, the details of how these approaches affect energy consumption have not been studied yet. Therefore, this paper aims to explore how much energy a supercomputer consumes while running scientific simulations when adopting various I/O management approaches. In particular, we closely examine three radically different I/O schemes including time partitioning, dedicated cores, and dedicated nodes. To do so, we implement the three approaches within the Damaris I/O middleware and perform extensive experiments with one of the target HPC applications of the Blue Waters sustained-petaflop supercomputer project: the CM1 atmospheric model. Our experimental results obtained on the French Grid’5000 platform highlight the differences among these three approaches and illustrate in which way various configurations of the application and of the system can impact performance and energy consumption. Moreover, we propose and validate a mathematical model that estimates the energy consumption of a HPC simulation under different I/O approaches. Our proposed model gives hints to pre-select the most energy-efficient I/O approach for a particular simulation on a particular HPC system and therefore provides a step towards energy-efficient HPC simulations in Exascale systems. To the best of our knowledge, our work provides the first in-depth look into the energy-performance tradeoffs of I/O management approaches.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号