期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

SIMCAN: A flexible, scalable and expandable simulation platform for modelling and simulating distributed architectures and applications

Alberto Núñez Javier Fernández Rosa Filgueira Félix García Jesús Carretero 《Simulation Modelling Practice and Theory》2012,20(1):12-32

In this paper we propose a new simulation platform called SIMCAN, for analyzing parallel and distributed systems. This platform is aimed to test parallel and distributed architectures and applications. The main characteristics of SIMCAN are flexibility, accuracy, performance, and scalability. Thence, the proposed platform has a modular design that eases the integration of different basic systems on a single architecture. Its design follows a hierarchical schema that includes simple modules, basic systems (computing, memory managing, I/O, and networking), physical components (nodes, switches, …), and aggregations of components. New modules may also be incorporated as well to include new strategies and components. Also, a graphical configuration tool has been developed to help untrained users with the task of modelling new architectures. Finally, a validation process and some evaluation tests have been performed to evaluate the SIMCAN platform. 相似文献

2.

网络应用系统服务器集群技术研究 总被引：7，自引：0，他引：7

罗清罗宇《计算机工程与科学》2004,26(7):37-40

本文从处理能力可扩展和容错的角度，给出了功能分布和对称的网络应用服务器集群结构，并讨论了实现存储部件容错的各种集群结构。相似文献

3.

HPP:一种支持高性能和效用计算的体系结构 总被引：3，自引：0，他引：3

孙凝晖李凯陈明宇《计算机学报》2008,31(9)

为了同时做到应对千万亿次高性能计算的技术挑战和满足数据中心(data center)未来的主要应用模式效用计算(utility computing)的需求,提出了一种称为HPP(Hyper Parallel Processing)的高性能计算机体系结构.HPP的主要特征是全局地址空间(global address space)和单一操作系统映像的超节点(hyper node).HPP结合了MPP的可扩展性,DSM的高效通信和机群的普及化的优点,为高性能计算和效用计算都提供了许多创新研究的机会.基于HPP体系结构,实现了一个曙光5000高性能计算机的原型系统,初步验证了它的可行性. 相似文献

4.

Towards application-level elasticity on shared cluster: an actor-based approach

Donggang Cao Lianghuan Kang Hanglong Zhan Hong Mei 《Frontiers of Computer Science》2017,11(5):803-820

In current cluster computing, several distributed frameworks are designed to support elasticity for business services adapting to environment fluctuation. However, most existing works support elasticity mainly at the resource level, leaving application level elasticity support problem to domain-specific frameworks and applications. This paper proposes an actor-based general approach to support application-level elasticity for multiple cluster computing frameworks. The actor model offers scalability and decouples language-level concurrency from the runtime environment. By extending actors, a new middle layer called Unisupervisor is designed to “sit” between the resource management layer and application framework layer. Actors in Unisupervisor can automatically distribute and execute tasks over clusters and dynamically scale in/out. Based on Unisupervisor, high-level profiles (MasterSlave, MapReduce, Streaming, Graph, and Pipeline) for diverse cluster computing requirements can be supported. The entire approach is implemented in a prototype system called UniAS. In the evaluation, both benchmarks and real applications are tested and analyzed in a small scale cluster. Results show that UniAS is expressive and efficiently elastic. 相似文献

5.

Scaling to a million cores and beyond: Using light-weight simulation to understand the challenges ahead on the road to exascale

《Future Generation Computer Systems》2014

As supercomputers scale to 1000 PFlop/s over the next decade, investigating the performance of parallel applications at scale on future architectures and the performance impact of different architecture choices for high-performance computing (HPC) hardware/software co-design is crucial. This paper summarizes recent efforts in designing and implementing a novel HPC hardware/software co-design toolkit. The presented Extreme-scale Simulator (xSim) permits running an HPC application in a controlled environment with millions of concurrent execution threads while observing its performance in a simulated extreme-scale HPC system using architectural models and virtual timing. This paper demonstrates the capabilities and usefulness of the xSim performance investigation toolkit, such as its scalability to 2²⁷ simulated Message Passing Interface (MPI) ranks on 960 real processor cores, the capability to evaluate the performance of different MPI collective communication algorithms, and the ability to evaluate the performance of a basic Monte Carlo application with different architectural parameters. 相似文献

6.

使用GTC-P应用评估曙光E级原型机的性能

王一超胡航 William Tang 王蓓林新华《计算机工程与科学》2020,42(1):1-7

曙光E级原型机是我国“十三五”计划中3台原型系统之一,该系统采用异构计算架构,CPU和加速器选用AMD授权的国产海光处理器架构。除了采用基准测试程序对芯片进行测试外,为探究真实应用在该原型机上的性能,移植了激光等离子体应用GTC-P,对比了GTC-P在海光CPU和DCU与Intel 6148 CPU和NVIDIA V100 GPU上的性能,并在原型机的多结点上进行了扩展性分析。性能评估工作反映了高性能计算应用在曙光E级原型机上的实际运行性能。相似文献

7.

Cost-oriented proactive fault tolerance approach to high performance computing (HPC) in the cloud

Ifeanyi P. Egwutuoha Shiping Chen David Levy Bran Selic Rafael Calvo 《International Journal of Parallel, Emergent and Distributed Systems》2014,29(4):363-378

Cloud computing offers new computing paradigms, capacity and flexible solutions to high performance computing (HPC) applications. For example, Hardware as a Service (HaaS) allows users to provide a large number of virtual machines (VMs) for computation-intensive applications using the HaaS model. Due to the large number of VMs and electronic components in HPC system in the cloud, any fault during the execution would result in re-running the applications, which will cost time, money and energy. In this paper we presented a proactive fault tolerance (FT) approach to HPC systems in the cloud to reduce the wall-clock execution time and dollar cost in the presence of faults. We also developed a generic FT algorithm for HPC systems in the cloud. Our algorithm does not rely on a spare node prior to prediction of a failure. We also developed a cost model for executing computation-intensive applications on HPC systems in the cloud. We analysed the dollar cost of provisioning spare nodes and checkpointing FT to assess the value of our approach. Our experimental results obtained from a real cloud execution environment show that the wall-clock execution time and cost of running computation-intensive applications in cloud can be reduced by as much as 30%. The frequency of checkpointing of computation-intensive applications can be reduced up to 50% with our FT approach for HPC in the cloud compared with current FT approaches. 相似文献

8.

Designing efficient irregular networks for heterogeneous systems-on-chip

Christian Norbert 《Journal of Systems Architecture》2008,54(3-4):384-396

Networks-on-chip will serve as the central integration platform in future complex systems-on-chip (SoC) designs, composed of a large number of heterogeneous processing resources. Most researchers advocate the use of traditional regular networks like meshes, tori or trees as architectural templates which gained a high popularity in general-purpose parallel computing. However, most SoC platforms are special-purpose tailored to the domain-specific requirements of their application. They are usually built from a large diversity of heterogeneous components which communicate in a very specific, mostly irregular way.

In this work, we propose a methodology for the design of customized irregular networks-on-chip, called INoC. We take advantage of a priori knowledge of the communication characteristic of the application to generate an optimized network topology and routing algorithm. We show that customized irregular networks are clearly superior to traditional regular architectures in terms of performance at comparable implementation costs for irregular workloads. Even more, they inherently offer a high degree of scalability and expansibility which allows to adapt the network to an arbitrary number of nodes with a given communication demand. This can normally not be accomplished by traditional approaches. 相似文献

9.

Performance Evaluation of Mixed-Mode OpenMP/MPI Implementations

J. Mark Bull James Enright Xu Guo Chris Maynard Fiona Reid 《International journal of parallel programming》2010,38(5-6):396-417

With the current prevalence of multi-core processors in HPC architectures mixed-mode programming, using both MPI and OpenMP in the same application, is seen as an important technique for achieving high levels of scalability. As there are few standard benchmarks written in this paradigm, it is difficult to assess the likely performance of such programs. To help address this, we examine the performance of mixed-mode OpenMP/MPI on a number of popular HPC architectures, using a synthetic benchmark suite and two large-scale applications. We find performance characteristics which differ significantly between implementations, and which highlight possible areas for improvement, especially when multiple OpenMP threads communicate simultaneously via MPI. 相似文献

10.

A Simulation-Based Scalability Study of Parallel Systems

《Journal of Parallel and Distributed Computing》1994,22(3):411-426

Scalability studies of parallel architectures have used scalar metrics to evaluate their performance. Very often, it is difficult to glean the sources of inefficiency resulting from the mismatch between the algorithmic and architectural requirements using such scalar metrics. Low-level performance studies of the hardware are also inadequate for predicting the scalability of the machine on real applications. We propose a top-down approach to scalability study that alleviates some of these problems. We characterize applications in terms of the frequently occurring kernels and their interaction with the architecture in terms of overheads in the parallel system. An overhead function is associated with each artifact of the parallel system that limits its scalability. We present a simulation platform called SPASM (Simulator for Parallel Architectural Scalability Measurements) that quantifies these overhead functions. SPASM separates the algorithmic overhead into its components (such as serial and work-imbalance overheads), and interaction overhead into its components (such as latency and contention). Such a separation is novel and has not been addressed in any previous study using real applications. We illustrate the top-down approach by considering a case study in implementing three NAS parallel kernels on two simulated message-passing platforms. 相似文献

11.

High level modeling and automated generation of heterogeneous SoC architectures with optimized custom reconfigurable cores and on-chip communication media

Balal Ahmad Ali Ahmadinia Tughrul Arslan 《Journal of Systems Architecture》2010,56(11):597-615

In this paper we propose a framework for modeling and automated generation of heterogeneous SoC architectures with emphasis on reconfigurable component integration and optimized communication media. In order to facilitate rapid development of SoC architectures, communication-centric platforms for data intensive applications, high level modeling of reconfigurable components for quick simulation and a tool for generation of complete SoC architectures is presented. Four different communication-centric platforms based on traditional bus, crossbar, hierarchical bus and novel hybrid communication media are proposed. These communication-centric platforms are proposed to cater for the different communication requirement of future SoC architectures. Multi-Standard telecommunication application is chosen as our target application domain and a case study of WiMAX is used as a real world example to demonstrate the effectiveness of our approach. A system consisting of an ARM processor, reconfigurable FFT and reconfigurable Viterbi decoder is considered with the option of system scalability for future upgrades. Behavior of system with different communication platforms is analyzed for its throughput and power characteristics with different reconfigurable scenarios to show the effectiveness of our approach. 相似文献

12.

Overcoming data locality: An in-memory runtime file system with symmetrical data distribution

《Future Generation Computer Systems》2016

In many-task computing (MTC), applications such as scientific workflows or parameter sweeps communicate via intermediate files; application performance strongly depends on the file system in use. The state of the art uses runtime systems providing in-memory file storage that is designed for data locality: files are placed on those nodes that write or read them. With data locality, however, task distribution conflicts with data distribution, leading to application slowdown, and worse, to prohibitive storage imbalance. To overcome these limitations, we present MemFS, a fully symmetrical, in-memory runtime file system that stripes files across all compute nodes, based on a distributed hash function. Our cluster experiments with Montage and BLAST workflows, using up to 512 cores, show that MemFS has both better performance and better scalability than the state-of-the-art, locality-based file system, AMFS. Furthermore, our evaluation on a public commercial cloud validates our cluster results. On this platform MemFS shows excellent scalability up to 1024 cores and is able to saturate the 10G Ethernet bandwidth when running BLAST and Montage. 相似文献

13.

Towards highly available and scalable high performance clusters

Azzedine Boukerche Raed A. Al-Shaikh Mirela Sechi Moretti Annoni Notare 《Journal of Computer and System Sciences》2007,73(8):1240-1251

In recent years, we have witnessed a growing interest in high performance computing (HPC) using a cluster of workstations. This growth made it affordable to individuals to have exclusive access to their own supercomputers. However, one of the challenges in a clustered environment is to keep system failure to the minimum and to achieve the highest possible level of system availability. High-Availability (HA) computing attempts to avoid the problems of unexpected failures through active redundancy and preemptive measures. Since the price of hardware components are significantly dropping, we propose to combine both HPC and HA concepts and layout the design of a HA-HPC cluster, considering all possible measures. In particular, we explore the hardware and the management layers of the HA-HPC cluster design, as well as a more focused study on the parallel-applications layer (i.e. FT-MPI implementations). Our findings show that combining HPC and HA architectures is feasible, in order to achieve HA cluster that is used for High Performance Computing. 相似文献

14.

云环境下的数据库扩展策略的设计

周文琼 ;王乐球 ;郑述招《微机发展》2014,(9):213-216

针对云环境下的应用系统规模越来越庞大的问题,提出了一种扩展性较好的数据库服务器扩展模型。该模型架构分为三个层次：逻辑SQL处理层、DA和CP层、物理数据库层。采用了读/写分离策略、数据库复制、负载均衡策略、服务器群集策略等技术,提出基于虚拟节点的加权一致性哈希负载均衡算法,根据物理节点的性能权值计算分配的虚拟节点数。通过仿真实验表明,该模型在负载均衡的性能上具有优势,在数据库层具有较好的扩展性。相似文献

15.

一种基于内存服务的内存共享网格系统 总被引：1，自引：0，他引：1

褚瑞肖侬卢锡城《计算机学报》2006,29(7):1225-1233

内存密集型应用对运行环境的物理内存要求严格,在物理内存不足时将会引发大量磁盘IO,降低系统性能．传统的网络内存致力于在集群内部通过共享空闲节点的物理内存解决该问题,但受集群负载和内部网络影响较大．通过结合网络内存和服务计算、网格计算等技术,提出一种基于内存服务的内存共享网格系统——内存网格,并分析和讨论了实现内存服务的关键技术和算法．内存网格弥补了网络内存的不足,扩展了网格计算的应用范围．通过基于真实应用运行状态的模拟,证明了内存网格与网络内存相比具有性能的提高．相似文献

16.

虚拟环境下Web服务动态负载均衡策略改进 总被引：1，自引：0，他引：1

刘胜楠汪诗林《计算机工程与科学》2015,37(9):1607-1613

为了提高Web服务集群的伸缩性和自动化能力,从虚拟化和负载均衡两方面研究集群系统,对现有负载采集策略做了改进,设计并实现了一种可根据负载值自动控制集群规模的模型XCluster。新模型运行在Xen提供的虚拟化环境中,实时监视宿主机层和虚拟机层的负载状态,随着集群系统总负载的增长,逐渐引入新的虚拟机来扩大集群规模,同时将任务合理分配到各个虚拟机节点上;当总负载下降时,逐渐关闭虚拟机缩小集群规模,释放出来的硬件资源又可以提供给其他集群系统使用。理论分析和实验结果表明,XCluster只需占用很少的网络通信量完成信息收集和命令下达,能够充分利用虚拟机易于管理的优势完成后端节点的调度,并且在任务总量相同的情况下,使用尽可能少的集群节点来执行任务。相似文献

17.

Accelerating big data analytics on HPC clusters using two-level storage

《Parallel Computing》2017

Data-intensive applications that are inherently I/O bound have become a major workload on traditional high-performance computing (HPC) clusters. Simply employing data-intensive computing storage such as HDFS or using parallel file systems available on HPC clusters to serve such applications incurs performance and scalability issues. In this paper, we present a novel two-level storage system that integrates an upper-level in-memory file system with a lower-level parallel file system. The former renders memory-speed high I/O performance and the latter renders consistent storage with large capacity. We build a two-level storage system prototype with Tachyon and OrangeFS, and analyze the resulting I/O throughput for typical MapReduce operations. Theoretical modeling and experiments show that the proposed two-level storage delivers higher aggregate I/O throughput than HDFS and OrangeFS and achieves scalable performance for both read and write. We expect this two-level storage approach to provide insights on system design for big data analytics on HPC clusters. 相似文献

18.

Utilizing commodity hardware and software to distribute a real‐world application: maximizing reuse while improving performance

Michael Davis Randy Smith Brandon Dixon Allen Parrish David Cordes 《Software》2005,35(7):621-641

Commodity computing hardware continues to increase performance while decreasing price. This combination is driving a renewed interest in parallel and distributed computing. In this study, we examine the performance of an existing application in a ten‐node computing cluster using commodity off‐the‐shelf components. The application is a statistical analysis software package that processes categorical data used by state public safety programs. The study examines various network topologies and focuses on minimizing the software modifications required to distribute the application. We conclude that parallel computing using commodity components is an effective mechanism to increase the performance of real‐world applications especially when the underlying application architectures have the flexibility to support efficient reuse of the existing code. Copyright © 2005 John Wiley & Sons, Ltd. 相似文献

19.

访存密集型应用在SMP机群系统中的性能分析

顾丽红吴少刚《小型微型计算机系统》2006,27(7):1258-1261

SMP机群系统因其良好的性价比、卓越的可扩展性与可用性，逐渐成为当前高性能计算机领域的主流结构．这种结点内共享存储、结点间消息传递的两级混合结构是目前并行计算研究的热点,在单个SMP结点中，总线和内存带宽是否满足CPU和I／O的需求对于访存密集型应用的性能影响很大。本文针对访存密集型应用的特点测试分析了在SMP机群中访存冲突对系统性能的影响，结果表明我们的SMP结点存在性能瓶颈，这种量化分析对于设计大规模的基于SMP的机群系统有很好的指导意义．相似文献

20.

Prospects for virtualization of high-performance x64 systems

A. O. Kudryavtsev V. K. Koshelev A. I. Avetisyan 《Programming and Computer Software》2013,39(6):285-294

Prospects for applying virtualization technology in high-performance computations on the x64 systems are studied. Principal reasons for performance degradation when parallel programs are running in virtual environments are considered. The KVM/QEMU and Palacios virtualization systems are considered in detail, with the HPC Challenge and NAS Parallel Benchmarks used as benchmarks. A modern computing cluster built on the Infiniband high-speed interconnect is used in testing. The results of the study show that, in general, virtualization is reasonable for a wide class of high-performance applications. Fine tuning of the virtualization systems involved made it possible to reduce overheads from 10–60% to 1–5% on the majority of tests from the HPC Challenge and NAS Parallel Benchmarks suites. The main bottlenecks of virtualization systems are reduced performance of the memory system (which is critical only for a narrow class of problems), costs associated with hardware virtualization, and the increased noise caused by the host operating system and hypervisor. Noise can have a negative effect on performance and scalability of fine-grained applications (applications with frequent small-scale communications). The influence of noise significantly increases as the number of nodes in the system grows. 相似文献